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The  Use  of  Data  Flow  Information  for  the 
Selection  and  Evaluation  of  Software  Test  Data 

Phyllis  G.  Frankl 

Elaine  J.  Wevuker,  Research  Advisor 


Tvso  families  of  software  test  data  adequacy  criteria,  each  based  on  data  flow 
analysis,  are  defined  for  programs  written  in  Pascal.  Their  formal  properties  are  investi- 
gated and  interactive  softsvare  testing  tools  based  on  them  are  described. 

The  first  of  these  families,  the  data  flow  testing  criteria,  was  previously  defined  for 
programs  written  in  a  simple  language.  We  extend  the  definitions  to  apply  to  programs 
written  in  Pascal.  The  data  flow  testing  criteria  are  based  purely  on  the  syntax  of  the  pro- 
gram being  tested.  They  require  that  the  test  data  execute  certain  paths  from  program 
points  at  which  variables  are  defined  to  program  points  at  which  those  definitions  are 
used.  We  describe  the  design  and  implementation  of  a  software  testing  tool,  ASSET, 
based  on  the  data  flow  testing  criteria. 

A  serious  weakness  of  the  data  flow  testing  criteria  is  that  for  some  programs  there 
exists  no  set  of  test  data  which  is  adequate  for  testing  the  program  according  to  these  cri- 
teria. This  problem  arises  due  to  unexecutable  paths  in  the  program.  The  second  family 
of  criteria,  the  feasible  data  flow  testing  criteria,  circumvent  this  problem  by  eliminating 
from  consideration  those  definition-use  associations  which  can  never  be  exercised.  Wc 
show  that  certain  fonnal  properties  of  the  feasible  data  flow  testing  criteria  differ 
significantly  from  those  of  the  data  flow  testing  criteria. 
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Since  it  is  undecidable  whether  a  given  set  of  test  data  satisfies  a  given  feasible  data 
flow  testing  criterion,  feasible  data  flow  testing  cannot  be  full\'  automated.  However,  it 
can  be  partially  automated.  We  describe  a  heuristic  method,  the  path  expression  method, 
which  attempts  to  determine  whether  a  given  definition-use  association  can  be  exercised. 
The  path  expression  method  is  based  on  a  combination  of  data  flow  analysis  and  sym- 
bolic evaluation.  We  introduce  a  new  symbolic  evaluation  technique  which  is  more  gen- 
eral, but  essentially  no  more  expensive,  than  symbolic  execution.  The  path  expression 
method,  along  with  ASSET,  constitute  a  tool  which  partially  automates  feasible  data  flow 
tesfin<z. 
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CHAPTER  1: 
Introduction 

1.  Background 

The  development  of  techniques  for  testing  programs  is  among  software 
engineering's  most  perplexing  problems.  Ideally,  these  techniques  should  be  amenable 
to  automation  and  should  yield  test  sets  which  are  small  enough  to  be  useful  in  practice 
yet  which  are  likely  to  expose  errors  in  the  program  being  tested.  Unfonunately,  there 
are  many  theoretical  and  practical  obstacles  to  the  development  of  such  techniques. 

It  is  often  stated  that  the  goal  of  testing  is  the  deieciion  of  errors.  That  is.  the  goal 
of  testing  is  to  find  an  input  which  causes  the  program's  output  to  differ  from  the 
specified  output.  This  distinguishes  testing  from  program  verification  in  which  one 
attempts  to  mathematically  prove  the  correctness  of  the  program.  Of  course,  if  the  pro- 
gram has  no  errors,  then  the  testing  process  can  never  achieve  this  goal.  A  more  realistic, 
though  less  precise  summar>'  o^  the  aim  of  testing  is,  "the  goal  of  testing  is  to  detect 
errors  or,  failing  that,  to  increase  confidence  that  ihe  program  is  correct." 

To  test  a  program  P,  one  chooses  a  set  T  of  possible  inputs  to  P.  runs  P  using  the 
elements  of  T  as  inputs,  and  compares  actual  outputs  to  the  expected  outputs.  If  an  error 
is  found,  then  the  testing  phase  icmiinates  and  the  debugging  phase  begins.  If  no  error  is 
found,  one  can  either  choose  additional  test  data  and  continue  the  testing  phase,  or  can 
release  the  software  to  the  user.  If  T  is  deemed  to  be  a  "good"  test  set,  then  the  fact  that 
running  the  program  on  the  elements  of  T  has  failed  to  expose  errors  in  the  program 
should  increase  contidence  that  the  program  is  correct. 

Two  key  questions  in  software  testing  research  are.  "how  should  software  test  data 
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be  selected?"  and  "how  can  one  tell  whether  a  program  has  been  tested  'enough'?".  A 
test  data  selection  method  is  a  procedure  for  choosing  test  cases  for  a  program.  A  test 
data  adequacy  criterion  is  a  predicate  used  to  evaluate  the  test  data,  that  is,  to  determine 
whether  a  program  has  been  tested  "enough".  Given  a  test  data  selection  method  M  and 
an  adequacy  criterion  A,  we  can  summarize  the  testing  process  with  the  following  algo- 
rithm: 

begin 

initialize  test  set  T  to  the  empt>'  set; 

error_detected  :=  false; 

adequately_tested  :=  false; 

while  (not  error_detected)  and  (not  adequately_tested)  do 

begin 

if  T  is  adequate  for  testing  program  P  whose  specification  is  S 

according  to  adequacy  criterion  A 
then  adequately_tested  :=  true 
else 
begin 

choose  set  T'  of  additional  test  cases 

according  to  test  data  selection  method  M: 
run  P  on  all  inputs  from  7',  comparing  outputs 

to  the  specified  outputs; 
if  error  is  detected  then  error_detected  :=  true; 
T:=Tur' 
end 
end 

if  error_detected  then  report  error 

else  release  program  P,  cenifying  that  it  has  been  adequately  tested 
according  to  criterion  A; 
end: 

It  is  possible  to  define  test  data  selection  methods  and  test  data  adequacy  criteria  in 
completely  ad  hoc  ways.  It  seems  reasonable,  however,  to  expect  that  test  data  selection 
methods  and  adequacy  criteria  which  exploit  available  information  will  yield  test  sets 
which  are  more  likely  to  expose  those  errors  which  exist  in  P. 

There  are  several  sources  of  information  available  to  aid  in  the  selection  and  evalua- 
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tion  of  software  test  data.  These  include  the  program  text,  the  program's  specification, 
and  information  about  common  types  of  errors.  Those  techniques  which  are  based  pri- 
marily on  the  program  text  are  called  program  based,  those  based  on  the  specification  are 
called  specification  based,  and  those  based  on  information  about  common  errors  are 
called  error  based.  Because  program  specifications  are  often  somewhat  vague  docu- 
ments written  in  natural  language,  and  because  of  the  scarcity  of  information  on  "com- 
mon errors",  program  based  methods  are  currently  more  amenable  to  automation  than  are 
specification  based  or  error  based  methods.  The  software  testing  strategies  examined  in 
this  thesis  are  program  based. 

Our  emphasis  in  this  thesis  will  be  on  adequacy  criteria.  Note  however  that  there  is 
a  correspondence  between  test  data  selection  methods  and  test  data  adequacy  criteria. 
Corresponding  to  an  adequacy  criterion  C,  there  is  a  test  data  selection  method  A/c, 
which,  in  effect,  says,  "select  a  test  set  which  satisfies  criterion  C".  For  example,  if  the 
adequacy  critenon  says  that  the  test  data  must  cause  the  execution  of  everj-  statement  in 
the  program,  then  the  corresponding  selection  method  would  involve  examining  the  pro- 
gram to  find  inputs  which  cause  the  execution  of  particular  statements.  Similarly, 
corresponding  to  a  selection  method  M.  there  is  an  adequacy  criterion  method  O/,  which 
says.  "A  test  set  is  adequate  if  it  has  been  produced  by  method  M".  Thus,  while  this 
thesis  will  mainly  discuss  adequacy  criteria,  the  reader  should  bear  in  mind  that  these  cri- 
teria also  implicitly  correspond  to  test  data  selection  methods. 

2.    rcrnn'ndlojjv 

Let  U  be  the  set  of  numbers  and  characters  rcprescniable  on  some  panicular  com- 
puter system    The  mpui  space,  1.  and  otirput  space,  O.  are  both  equal  to  the  set  of  finite 


sequences  of  elements  of  U.  A  specification  is  a  relation  ScIxO.  The  set  D^=  (iel  B 
oeO  such  that  (i,o)€S}  is  called  the  input  domain  of  specification  S.  A  program  P 
defines  a  panial  function  (which  by  abuse  of  notation  we  will  also  call  P)  P  :  I->0.  The 
input  domain  of  program  P  is  set  D^={iell  program  P  halts  on  input  i}  ={iell  P(i)  is 
defined).  Those  inputs  which  cause  program  P  to  crash  or  to  go  into  an  infinite  loop  are 
not  in  D^. 

Program  P  is  correct  with  respect  to  specification  S  if  and  only  if 

1)  D^eZ)'',  and 

2)  VieD'''((i,P(i))eS. 

That  is,  P  is  correct  with  respect  to  S  if  and  only  if  for  each  element  i  of  the  input  domain 
of  S,  P  halts  on  input  i,  returning  a  value  which  is  in  accordance  with  the  specification. 

A  test  set  T  for  a  program/specification  pair,  (P,S)  is  a  subset  of  the  specification's 
input  domain.  Throughout  this  thesis,  when  we  address  the  question  of  whether  a  test  set 
T  satisfies  an  adequacy  criterion,  we  will  also  assume  that  T  is  a  subset  of  the  program's 
input  domain.  This  is  a  reasonable  assumption  because  one  normally  applies  an  ade- 
quacy criterion  after  the  program  has  run  successfully  on  all  elements  of  the  test  set. 
When  no  ambiguity  results,  we  will  omit  reference  to  the  specification  S  and  assume  that 

A  program  unit's  structure  can  be  represented  abstractly  b\  its_/?ou'  graph,  a  single- 
entr\',  single  exit  directed  graph  in  which  each  node  represents  a  sequence  of  statements 
which  are  always  executed  as  a  unit  and  each  edge  represents  a  conditional  or  uncondi- 
tional transfer  of  control.  A  path  is  a  sequence  («  i ,  .  .  .  ,a!,„)  of  nodes  such  that  («,,«,+i) 
is  an  edge  for  l<i<m.   A  complete  path  is  a  path  («],...  ,«,„)  in  which  «i  is  the  entrj' 
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node  and  n^  is  the  exit  node  of  the  flow  graph.  If  G  is  the  flow  graph  of  a  program  unit 
P,  we  will  sometimes  refer  to  a  path  through  G,  as  a  path  through  P.  In  chapter  2  we  will 
give  a  more  precise  definition  of  the  flow  graphs  for  the  class  of  program  units  we  are 


considering. 


3.  Limitations  of  Testing 

We  now  take  note  of  some  o^  the  inherent  limitations  of  program  testing  as  a  valida- 
tion strategy. 

We  first  note  thai  it  is  usually  impractical  to  prove  a  program  correct  by  testing  it. 
Let  T  be  a  proper  subset  of  the  input  domain,  D^  of  a  specification  S.  Let  /oeD'^-T  be  a 
point  such  that  for  some  y,  (fo,y)^S.  Then  there  is  a  program  P  such  that  (t.P(t))€S  for 
all  teT  but  (/o.PUo))  ^S.  P  is  incorrect  with  respect  to  S,  but  this  fact  is  not  exposed  by 
testing  P  on  the  test  set  T.  Thus  the  only  way  to  prove  a  program  correct  by  testine  it  is 
by  testing  it  on  every  element  of  the  D^  (e.xcept  perhaps  those  elements  i  for  which 
(ylS(i,y))=U). 

Another  limitation  of  testing  stems  from  the  fact  that  in  order  for  testing  a  program 
to  be  meaningful,  there  must  be  some  way  of  determining  for  each  test  case  t  whether 
(t.P(t))6  S.  This  may  be  a  difficult  problem,  for  example,  if  the  amount  of  data  produced 
by  the  program  is  ver\'  large,  or  if  the  program  is  written  to  "find  the  answer"  to  some 
question  for  which  the  answer  is  not  known  [WEY82]. 

L'ndecidability  results  place  limitations  on  the  automatability  of  program  based  test- 
ing. A  node,  branch,  or  path  is  executable  if  and  only  if  there  is  some  element  of  the 
input  space  which  causes  it  to  be  executed  llic  following  undecidabiliiy  results  place 
fundamental  limitations  on  the  feasibility  of  automating  the  program  validation  process 
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[WEY79,DAV83]:  There  is  no  algorithm  to  determine  whether 

( 1 )  a  given  program  P  halts  on  a  given  input  t, 

(2)  a  panicular  statement,  branch,  or  path  is  executable, 

(3)  a  particular  path  from  the  entr)'  to  the  exit  of  a  program  is  executable. 

Problem  (1)  is  the  famous  halting  problem;  the  halting  problem  can  easily  be  reduced  to 
the  problems  in  (2);  the  unsolvability  of  (3)  follows  from  the  unsolvability  of  Hilbert's 
Tenth  Problem  [DAV73]. 

4.  Related  Work 

Several  software  test  data  adequacy  criteria  are  based  on  the  idea  that  one  cannot 
consider  a  program  to  be  adequately  tested  if  certain  sequences  of  statements  have  never 
been  executed  by  any  test  data.  These  methods,  known  as  structured  testing,  associate  a 
lest  set  T,  with  the  set  Fl  of  paths  through  program  P"s  flow  graph  which  are  executed 
when  P  is  run  with  inputs  from  T.  The  test  set  T.  or  equivalently  the  set  of  paths  n,  is 
C-adequate  for  P  if  and  only  if  each  of  the  sequences  required  by  C  is  a  subpath  of  one  of 
the  paths  in  H. 

The  strongest  of  the  structured  testing  criteria  is  path  testing,  which  requires  that 
every  path  through  the  program's  flow  graph  be  executed  by  the  test  data.  Path  testing  is 
impractical  for  all  but  the  most  trivial  programs,  since  any  program  having  a  loop  has 
infinitely  many  paths.  Nevertheless,  it  is  useful  to  study  path  testing  because  some  of  the 
limitations  of  path  testing  apply  a  fortiori  to  the  other  structured  testing  techniques. 

One  limitation  of  path  testing  is  that  it  cannot  guarantee  the  detection  of  so-called 
missing  path  errors  [G0075].  If  the  program  fails  treat  some  special  case,  there  will  be 
no  path  in  the  program  which  corresponds  to  that  special  case,  and  hence  the  test  data 
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uill  not  be  required  to  exercise  that  special  case.  In  fact,  a  similar  problem  plagues  all 
program  based  testing  strategies:  if  the  program  lacks  some  feature  which  it  is  supposed 
to  have,  analysis  of  the  program  text  will  fail  to  suggest  the  inclusion  of  test  data  aimed 
at  exercising  that  feature. 

A  second  limitation  of  path  testing  arises  due  to  coincidental  correctness.  Suppose 
that  test  cases  t  and  t'  both  cause  path  tt  to  to  be  executed,  and  that  (t,P(t))eS,  while 
(r',P(r'))^S.  Then  a  test  set  T  which  includes  t  but  not  ;'  will  execute  path  tc  without 
exposing  the  error.  For  example,  suppose  that  path  n  is  supposed  to  read  a  number, 
square  it,  and  print  the  result;  hut  instead  it  reads  a  number,  doubles  it,  and  prints  the 
result.  Then  the  input  t=2  v.  ill  execute  n  but  fail  to  expose  the  error. 

Statement  testing  and  branch  testing  are  well-known  structured  testing  methods 
which  are  weaker  than  path  testing.  Statement  testing  requires  that  even,'  statement 
(equivalently.  ever>-  node  in  the  program's  flow  graph)  be  executed,  while  branch  testing 
requires  that  ever>'  edge  in  the  program's  flow  graph  be  executed.  These  criteria  are 
based  on  the  intuition  that  one  cannot  feel  ver\'  confident  that  the  program  is  coirect  if 
some  statement  or  bran.h  has  never  been  executed  by  any  of  the  test  data.  Unfor- 
tunately, statement  and  branch  testing  can  fail  to  expose  many  common  errors 
(nO\V7S,m  A7.S|.  Several  criteria  which  are  based  on  analysis  of  the  program's  control 
flou  and  which  are  stronger  than  branch  testing  but  weaker  than  path  testing  have  been 
proposed  |HCJ\\7K.M1L74.\VOOSO]. 

More  recently,  a  number  of  le.st  data  adequacy  criteria  \Ahich  are  based  on  data  flow- 
analysis,  some  of  which  "bridge  the  gap "  between  branch  testing  and  path  testing,  have 
been  proposed  and  studied  |I1I:R76.R.\Ps:,R Ars^l.AS83.NTAS4.CLA85J.  Tools 
based  on  some  of  them  have  been   implemented  |FRA85a.FRA85b.GIR85.KOR85J. 
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These  criteria  are  based  on  the  intuition  that  one  should  not  feel  confident  that  a  variable 
has  been  assigned  the  correct  value  at  some  point  in  the  program  if  no  test  data  causes  the 
execution  of  a  path  from  the  assignment  to  a  point  where  the  variable's  value  is  subse- 
quently used. 

Rapps  and  Weyuker  [RAP82,RAP85]  defined  a  family  of  criteria  based  on  data  flow 
analysis  for  programs  written  in  a  simple  language  having  only  simple  variables, 
input/output  statements,  assignment  statements,  and  conditional  and  unconditional 
branching.  The  basic  idea  of  these  criteria  is  to  require  the  test  data  to  exercise  cenain 
paths  from  points  in  the  program  at  which  a  variable  is  defined  to  points  at  which  that 
definition  is  used.  Rapps  and  Weyuker  compared  the  various  criteria  in  this  family  to 
one  another  and  to  statement,  branch  and  path  testing.  The  family  of  criteria  defined  in 
chapter  two  of  this  thesis  is  obtained  by  extending  the  definitions  of  the  Rapps  and 
Weyuker  criteria  to  apply  to  programs  written  in  a  large  subset  of  Pascal. 

The  criterion  defined  by  Herman  is  similar  to  one  of  the  Rapps  and  Weyuker  cri- 
teria, the  "all-uses"  criterion.  Roughly  speaking,  The  criteria  defined  by  Ntafos  [NTA84] 
requires  the  test  data  to  exercise  sequences  of  length  k  of  definition-use  pairs,  where  a 
variable  .vi  is  defined  in  node  / 1  and  used  in  node  i 2.  and  variable  X2  is  defined  in  node 
j2  and  used  in  node  ;3,  etc.  The  criteria  defined  by  Laski  and  Korel  [LAS83]  are  based 
on  the  intuition  that  the  value  assigned  to  a  variable  at  a  particular  node  n  may  depend  on 
the  values  of  several  variables,  each  of  which  could  possibly  have  several  definitions 
which  reach  node  n.  The  criteria  require  the  test  data  to  exercise  paths  along  which  vari- 
ous combinations  of  these  definitions  occur.  Clarke  et.  al.  [CLA86]  compared  the  criteria 
defined  by  Rapps  and  Weyuker.  by  Ntafos  and  by  Laski  and  Korel  to  one  another. 
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An  adequacy  criterion  C  satisfies  the  applicabiliry  property  if  and  only  if  for  everv' 
program  P  there  exists  some  test  set  which  is  C-adequate  for  P  [WEY86].  One  would 
expect  a  "good"  adequacy  criterion  C  to  satisfy  the  applicability  propeny.  However, 
none  of  the  structured  testing  criteria  satisfy  the  applicability  propeny.  The  problem 
arises  due  to  the  fact  that  some  path  which  is  required  by  the  criterion  may  be  unexecut- 
able.  Furthermore,  for  each  of  the  structured  testing  criteria,  it  is  undecidable  whether  a 
test  set  exists  which  adequately  tests  a  given  program.  In  chapter  4  we  will  define  a  fam- 
ily of  adequacy  criteria  uhich  is  derived  from  the  Rapps  and  Weyuker  data  flow  testing 
criteria,  and  which  circumvents  this  problem.  A  similar  approach  could  be  taken  to 
deriving  applicable  criteria  from  the  other  structured  testing  criteria. 

5.  Thesis  Over\  icw 

In  this  thesis  we  examine  iv\o  families  of  test  data  adequacy  criteria.  For  each  of 
these  families,  we  define  the  criteria  for  programs  written  in  a  large  subset  of  Pascal,  we 
investigate  formal  properties  of  the  criteria,  and  we  discuss  software  testing  tools  based 
on  the  criteria. 

The  first  of  these  families,  the  dam  flow  testing  criteria,  is  derived  from  the  family 
of  criteria  defined  by  Rapps  and  Weyuker,  These  criteria,  which  we  define  in  section  1 
of  chapter  2,  are  based  purely  on  the  syntax  of  the  program  being  tested.  They  require 
the  test  data  to  execute  certain  paths  from  points  in  the  program  at  which  a  variables  are 
defined  to  points  where  those  definitions  are  used.  The  criteria  are  defined  in  such  a  way 
that  the  relationship  among  them  is  the  same  as  the  relationship  among  Rapps  and 
Wcyuker's  family  of  criteria. 

Weyuker  has  defined  eight  axioms  which,  she  argues,  should  be  satisfied  b>   a 
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"good"  adequacy  criterion  [\VEY861.  In  Secrion  2.2,  we  explore  some  of  the  properties 
of  the  data  flow  testing  criteria  by  examining  which  of  the  criteria  satisfy  which  of 
Weyuker's  axioms.  We  show  that  the  criteria  have  some  serious  shortcomings,  including 
failure  to  satisfy  the  applicability  property. 

In  chapter  three,  we  describe  the  design  and  implementation  of  a  tool,  ASSET, 
which  is  based  on  the  data  flow  testing  criteria.  ASSET  takes  as  input  a  program  P,  a  test 
set  T,  and  one  of  the  data  flow  testing  criteria,  C,  and  determines  whether  T  is  adequate 
according  to  C  for  testing  P. 

The  second  family  of  criteria,  the  feasible  data  flow  testing  criteria,  is  defined  in 
chapter  4,  section  1.  TTiese  criteria,  which  are  derived  from  the  data  flow  testing  criteria, 
take  into  account  some  semantic  information  about  the  program  being  tested.  The  cri- 
teria are  defined  in  such  a  way  as  to  insure  that  they  satisfy  the  applicability  propeny. 
We  show  that  that  the  relationship  among  the  family  of  feasible  data  flow  testing  criteria 
differs  significantly  from  the  relationship  among  the  family  of  data  flow  testing  criteria. 
In  section  4.2  v.e  show  that  most  of  the  feasible  data  flow  testing  criteria  satisfy  all  of 
Weyuker's  axioms. 

Unfortunately,  for  any  feasible  data  flow  testing  criterion  C  there  is  no  algorithm  to 
determine  whether  a  given  set  of  test  data  is  C-adequate  for  testing  a  given  program.  The 
problem  arises  because  in  order  to  determine  whether  T  is  C-adequate,  it  may  be  neces- 
sary' to  determine  whether  some  definition-use  association  can  ever  be  exercised.  That  is, 
it  may  be  necessary  to  determine  whether  there  is  an  executable  path,  along  which  some 
variable  v  is  not  redefined,  from  a  node  d  to  a  node  u. 

In  chapter  5  we  present  a  heuristic  which  attempts  to  detemiine  whether  any  test 
data  exists  which  exercises  a  civen  definition-use  association.  The  heuristic  is  based  on  a 
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combination  of  symix)lic  evaluation  and  data  flow  information.    This  heuristic,  along 
with  ASSET  form  the  basis  for  an  interactive  feasible  data  flow  testing  tool. 
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CHAPTER  2: 

Tht  Data  Flow  Testing  Criteria 

1.  Definitions  of  the  Data  Flow  Testing  Criteria 

A  family  of  test  data  adequacy  criteria,  based  on  analyzing  the  data  flow  charac- 
teristics of  the  program  being  tested,  was  defined  in  [RAP82,RAP85].  These  criteria, 
which  we  call  dataflow  testing  criteria,  or  DP  testing  for  short,  were  originally  defined 
for  a  very  simple  universal  programming  language  consisting  of  assignment  statements, 
conditional  and  unconditional  transfer  statements,  and  I/O  statements.  They  require  that 
the  test  data  exercise  certain  paths  from  a  point  in  a  program  where  a  variable  is  defined 
to  points  where  the  variable  is  subsequently  used.  A  tool.  ASSET,  which  performs  DP 
testing  on  programs  written  in  such  a  language  is  described  in  [FRA85a]. 

In  order  to  make  data  flow  testing  more  practical,  we  have  extended  it  to  apply  to  a 
large  subset  of  Pascal,  and  have  enhanced  ASSET  accordingly.  The  basic  ideas  behind 
data  flow  testing  apply  to  testing  programs  written  in  other  imperative  languages,  but  for 
precision  it  is  necessary  to  specify  a  panicular  syntax.  We  now  present  the  extended 
theory  of  data  flow  testing. 

We  apply  DP  testing  to  an  individual  subprogram,  i.e.,  a  main  program,  a  pro- 
cedure, or  a  function.  To  execute  a  procedure  or  function  P.  we  must  call  it  from  a  driver 
program.  Thus,  to  test  a  procedure  or  function  P  we  need  a  test-set'drivcr-program  pair 
(1  .D),  uherc  I)  is  a  program  which  might  call  P  and  T  is  a  subset  of  the  input  domain  of 
the  specification  for  D.  Obviously,  the  path  (or  paths)  through  P  which  is  executed  when 
a  panicular  test  case  is  input  to  D  will  depend  on  D,  as  well  as  on  the  test  case.  \S  c  will 
often  omit  reference  to  the  dnver  program,  when  ii  is  obvious  which  dnvcr  program  is 
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calling  the  subprogram.  Similarly,  we  may  omit  reference  to  the  driver  program  if  it  sim- 
ply reads  in  the  arguments  to  the  subprogram  in  order,  then  calls  the  subprogram  once. 

As  a  technical  convenience  we  assume  that  the  subprogram  being  tested  has  no  goto 
statements,  no  with  statements,  no  variant  records,  no  functions  having  var  parameters, 
no  procedural  or  functional  parameters,  and  no  conformant  arrays.  It  would  not  be 
difficult  to  relax  these  assumptions.  We  also  assume  that  in  every  conditional  statement, 
the  Boolean  expression  which  determines  the  flow  of  control  has  at  least  one  occurrence 
of  a  variable  or  a  call  to  the  function  eof  or  to  the  function  eoln. 

A  subprogram  can  be  uniquely  decomposed  into  a  set  of  disjoint  blocks  of  state- 
ments. A  block  is  a  maximal  sequence  of  simple  statements  having  the  properties  that  it 
can  only  be  entered  through  the  first  statement  and  that  whenever  the  first  statement  is 
executed,  the  remaining  statements  are  executed  in  the  given  order.  The  subprogram  to 
be  tested  is  represented  by  a  flow  graph  in  which  the  nodes  correspond  to  the  blocks  of 
the  subprogram  and  edges  indicate  possible  flow  of  control  between  blocks.  As  a  techni- 
cal convenience,  some  nodes  uhich  correspond  to  empty  sequences  of  statements  may 
also  be  added  to  the  flow  graph.  Figure  2.1.1  shows  the  subgraphs  corresponding  to 
statements  in  the  language.  The  subprogram's  flow  graph  is  obtained  by  merging  the  exit 
node  of  each  statement  with  tlie  entn.'  node  of  the  following  statement.  An  entr>'  node 
preceding  the  first  statement  of  the  procedure  and  an  exit  node  succeeding  the  last  state- 
ment are  added. 

We  will  call  a  flow  graph  formed  according  to  the  rules  in  Figure  2.1.1  a  structured 
flow  graph.  All  of  the  flow  graphs  considered  in  this  thesis  will  be  structured  flow 
graphs.  The  "extra"  nodes,  corresponding  to  empty  sequences  of  statements,  ha\e  been 
added  as  a  technical  convenience,  in  order  to  simplify  some  of  the  algorithms  in  Chapter 


FIGURE  2.1.1 

Control  flow  and  da:a  flow  for  statements  in  the  language 

,S.i         deno:e>  the  subgraph  corresponding  to  statement  Si. 


SI.MPLE  STATEMENTS 
.Assignment  statement:        v:=     expr; 


Node  1  has  cuscs  of  each  variable  in  -^ 

exor  followed  bv  a  definition  of  v.  ^ 


Input. Output  statements: 

read(vl vnl: 

readln(  vl vni: 

read(f.vl vnl: 

readln(f,vl vnl; 

Node  1  has  definitions  of  vl \n. 

If  the  '-le  variable  f  is  present  then  node  i 

also  has  a  c-ase  followed  bv  a  definition  of  f\  ■»■ 

_ : ^ 

w-ritefel enV  V 

wruclniel en): 

writctf.el en); 

w  ritclnff.cl en); 

Node  1  has  c-uses  of  each  variable  occurring  in  cl, en. 

If  the  file  variable  f  is  present  then  node  i 
also  has  a  definition  followed  b\  a  c-use  of  f*. 


Procedure  call:       Picl .en);  f 

Node  J  ha>  c-uscs  of  each  variable  occuring  in 

the  expressions  cl en. 

T.-c*c  arc  fo.l  wed  b>  defin:tions  of  each  actual  ^ 

parameter  w.T..n  corresponds  to  a     >Br    formal  parameter  ^ 

Nodes  ;  and  ic  are  indujed  to  assure  that  ?^ 

the  procedure  call  has  its  own  node.  v 
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REPETITIVE  STATEMENTS 


while  statement: 


while  B  do  S. 


Let  h  be  the  entry  node 
to  subgraph  S. 
Edges  (i.h)  and  (i.j)  have 
p-uses  of  each  variable  in 
the  boolean  expression  B. 


®, 


for  statement: 

for  v:=  el  to  e2  do  S; 

for  v:^  el  downto  e2  do  S; 

Let  tmp  be  a  new  variable. 
Let  f  and  g  be  the  entry 
and  exit  nodes,  respectively, 
of  S.  Node  h  has  c-uses  of 
each  variable  ;n  el, 
followed  by  a  definition  of  v  and 
c-uses  of  each  variable  m  e2 
followed  by  a  definition  of  tmp. 
Edges  (i.f)  and  fi.j)  have 
p-uses  of  V  and  tmp.  Node  g  has 
a  c-use  followed  bv  a  def  of  v. 


(J. 


repeat  statement: 

repeat  SI;...  ;Sn  until  B: 

Let  j  be  the  entry  node  of 
SI.  and  let  k  be  the  exit 
node  of  Sn. 

Edges  (k.j)  and  (k,i)  have 
p-uses  of  each  variable  in 
the  boolean  expression  B. 


(^continued  on  next  page) 


CONDITIONAL  STATEMENTS 


if-then-else  statement 


if  B  then  SI: 

■f  B  then  51  else  S2; 

Let  k  and  j  be  the  entry  nodes  of 
SI  and  52.  respectively. 
Edges  (i.j)  and  (i.k)  have 
p-uses  of  each  variable  in  the 
boolean  expression  B. 
:f  there  ;>  no    else"  part  then 
subgraph  52  has  a  single  node 
correspondmu  to  an  empty  block. 


case  el  of 

label-iistl  :  51; 


Y 
^ 


labei-listn     Sn 
end: 

Let  jl jn  be  the  entry  nodes  of 

51....,Sn.  respectively. 

Edges  (i.jl) (i.jn) 

have  p-uses  of  each  variable 
m  the  expression  el. 


<^. 
^ 


i^>i 
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5  and  Appendix  11. 

Data  flow  analysis  was  originally  used  for  compiler  optimization  [HEC77,SCH73]. 
It  generally  classifies  each  variable  occurrence  as  being  either  a  definition,  in  which  a 
value  is  stored  in  a  memory  location,  a  use,  in  which  a  value  is  fetched  from  a  memor>' 
location,  or  an  undefinition,  in  which  the  value  and  the  location  become  unbound.  For 
our  purposes,  we  will  also  distinguish  between  two  different  types  of  uses.  The  first  type 
directly  affects  the  computation  being  performed  or  outputs  the  result  of  some  earlier 
definition.  We  call  such  a  use  a  computation  use  or  a  c-use.  Of  course,  a  c-use  may 
indirectly  affect  the  flow  of  control  through  the  subprogram.  In  contrast,  the  second  type 
of  use  directly  affects  the  flow  of  control  through  the  subprogram,  and  thereby  may 
indirectly  affect  the  computations  performed.  We  call  such  a  use  ?i  predicate  use  or  p- 
use. 

We  will  associate  a  sequence  of  definitions  and  c-uses  with  each  node  in  the  flow 
graph  and  will  associate  a  set  of  p-uses  with  each  edge  in  the  flow  graph.  Figure  2.1.1 
shows  the  classification  of  variable  occurrences  in  the  language's  statements.  In  addi- 
tion, the  entry  node  is  considered  to  have  a  definition  of  each  parameter,  each  non-local 
variable  which  occurs  in  the  subprogram  and  of  the  input  buffer  input  T,  which  may 
implicitly  occur  in  calls  to  the  standard  procedures/functions  read,  readln,  eoln  and  eof. 
The  exit  node  has  an  undefinition  of  each  local  variable,  a  c-use  of  each  variable  parame- 
ter, a  c-use  of  each  non-local  variable,  and  a  c-use  of  the  input  buffer,  input  T. 

We  now  discuss  hov.  data  flow  analysis  is  handled  for  structured  variables.  Since  it 
is  not  in  general  possible  to  determine  the  particular  array  element  which  is  being  defined 
or  used  in  an  occurrence  of  an  array  variable.  v.e  will  treat  each  entire  array  as  a  single 
entitv.  If  a  is  an  arrav  variable,  anv  definition  of  the  variable  a  \e  ]  w  ill  consist  of  a  c-use 
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of  each  variable  occurring  in  the  expression  e,  followed  by  a  definition  of  a.  Any  use  of 
a  [e  ]  will  consist  of  uses  of  all  of  the  variables  occurring  in  e  followed  by  a  use  of  a. 

Similarly,  we  will  treat  pointers  purely  syntactically,  making  no  attempt  to  perform 
data  flow  analysis  on  dereferenced  pointers.  If  p  is  a  pointer  variable,  a  definition  of  p  T 
consists  of  a  c-use  of  p  followed  by  a  definition  of /^T,  and  a  use  of  pT  consists  of  a  use 
of  p  followed  by  a  use  of /?  T.  Since  it  is  not  possible  to  statically  determine  the  memor>' 
location  to  which  a  pointer  points,  we  will  ignore  the  definitions  and  uses  of  p  T. 

Each  field  of  a  record  is  treated  as  an  individual  variable.  Any  unqualified 
occurrence  of  a  record  is  treated  as  an  occurrence  of  each  field  of  the  record. 
Occurrences  of  file  variables  in  I/O  statements  are  handled  by  considering  the  effect  of 
the  statement  on  the  file  buffer. 

Note  that  our  model  of  data  flow  may  not  reflect  the  actual  data  flow  in  the  subpro- 
gram being  tested  completely  accurately.  For  example,  we  have  made  no  attempt  to  per- 
form any  interprocedural  data  flow  analysis,  have  ignored  dereferenced  pointers,  have 
made  no  attempt  to  disambiguate  array  references,  and  have  ignored  potential  aliasing 
and  side  effects.  In  an  optimizing  compiler  it  is  imperative  that  conser\'ative  assump- 
tions be  made  about  the  tlou  of  data,  lest  a  code  transformation  which  changes  the 
semantics  of  the  program  be  pcrt'ormed.  In  the  context  of  data  flow  testing,  however, 
such  caution  is  not  strictly  necessar>'.  On  the  other  hand,  it  seems  reasonable  to  expect 
that  more  accurate  data  flow  analysis  will  force  the  selection  of  better  test  data.  In 
(rR.A85bl  ^c  compare  the  test  data  needed  to  adequately  lest  programs  when  each  array 
is  treated  as  a  single  entity  to  the  test  data  required  to  adequately  lest  transformed  pro- 
grams in  which  array  references  are  di.sambiguaied  and  each  element  of  the  array  is 
treated  as  an  individual  entity.  More  exploration  of  the  trade  off  between  the  difficulty  of 
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performing  accurate  data  flow  analysis  and  the  quality  of  resulting  test  data  is  needed. 

We  are  interested  in  tracing  the  flow  of  data  between  nodes,  and  thus  define  a  c-use 
of  a  variable  x  in  node  i  to  be  a  global  c-use  if  the  value  of  x  has  been  assigned  in  some 
block  other  than  block  i.    Let  x  be  a  variable  occurring  in  a  subprogram.    A  path 

(j,«i «m<y)<  m>0,  containing  no  definitions  or  undefinitions  of  x  in  nodes  «],...,«„  is 

called  a  definition  clear  path  with  respect  to  x  (def-clear  path  wrt  x)  from  node  i  to  node  j 
and  from  node  i  to  edge  (n,^J).  A  node  i  has  a  global  definition  of  a  variable  x  if  it  has  a 
definition  of  x  and  there  is  a  def-clear  path  wrt  x  from  node  i  to  some  node  containing  a 
global  c-use  or  edge  containing  a  p-use  of  x.  Since  ever>-  p-use  is  associated  with  a 
potential  transfer  of  control  from  one  node  to  another,  there  is  no  need  to  distinguish 
between  p-uses  and  global  p-uses. 

We  restrict  the  class  of  subprograms  to  which  data  flow  testing  applies  to  those  sub- 
programs P  satisfying  the  following  two  properties: 

1.  No-Syntactic-Undefined-P-use  Property  (NSUP): 

For  ever\'  p-use  of  a  variable  x  on  an  edge  (i,j)  in  P,  there  is  some  path  from  the  start 
node  to  edge  (i,j)  which  contains  a  global  definition  of  x. 

2.  Non-Straight-Line  Propeny  (NSL): 

P  has  at  least  one  conditional  or  repetitive  statement. 

Note  that  the  NSL  property  guarantees  that  at  least  one  node  in  P"s  flow  graph  has 
more  than  one  successor,  and  that  at  least  one  variable  has  a  p-use  in  P. 

The  subprogram's  dcf-use  graph  is  obtained  from  the  flow  graph  by  associating 
with  each  node  i.  the  sets  c-u.veO\;={  variables  which  have  global  c-uses  in  block  i)  and 
rfe/0'j={ variables  which  ha\c  global  definitions  in  block  i)  and  associating  with  each 
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edge  (i,j)  the  set  /7-w5eC/,y')={  variables  which  have  p-uses  on  edge  (i,j^).  We  also  define 
sets  of  nodes  dcu(x.i )={nodc^  j  such  that  xGC-use(j)  and  there  is  a  def-clear  path  with 
respect  to  x  from  i  to  j)  and  dpu(x,i)={cdgQS  (j,k)  such  that  xGp-use(j,k)  and  there  is  a 
def-clear  path  with  respect  to  x  from  i  to  (j,k)).  These  definitions  are  summarized  in  Fig- 
ure 2.1.2. 

V  =  the  set  of  variables 

N  =  the  set  of  nodes 

E  =  the  set  of  edges 

def(i)  =  {xe  VI  x  has  a  global  definition  in  block  i} 

c-use(i)  =  (xe  VI  x  has  a  global  c-use  in  block  i) 

p-use(i,j)  =  (x€  VI  X  has  a  p-use  in  edge  (i,j)  ) 

dcu(x,i)  =  {j  €  N  I  X  e  c-use{j)  and  there  is  a  def-clear  path  v\Tt  x  from  i  to  j) 

dpu(x,i)  =  {(j,k)  e  E  I  X  €  p-use(j,k)  and  there  is  a  def-clear  path  wn  x  from  i  to  (j,k)  ) 

Figure  2.1.2 

Thus,  if  xe  def(i)  and  je  dcu(x,i)  then  x  has  a  global  definition  in  node  i  and  c-use  in  node 
j  and  there  is  a  definition  clear  path  with  respect  to  x  from  node  i  to  node  j.  Therefore  it 
may  be  possible  for  control  to  reach  node  j  with  the  variable  x  having  the  value  which 
was  assigned  to  it  in  node  i. 

A  dcfinition-c-use  association  is  a  triple  (i,j.x)  where  i  is  a  node  containing  a  global 
definition  of  x  and  je  dcu(x.i).  A  dcfinition-p-use  association  is  a  triple  (i.fj.kt.xt  where  i 
is  a  node  containing  a  global  definition  of  x  and  (j,k)  e  dpu(x.i).  A  simple  path  is  one  in 
which  all  nodes,  except  possibly  the  first  and  last,  arc  distinct.   A  loop-free  path  is  one  in 

which  all  nodes  are  distinct.  A  path  {n\ Hj.rii,)  is  a  dupath  with  respect  to  a  variable 

X  if  /J  1  has  a  global  definition  of  x  and  cither 

i)      «i  has  a  global  c-use  of  \  and  {n\ n^./Ji)  is  a  def-clcar  simple  path  with  respect  to 

\.  or 
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ii)      («y,«i)  has  a  p-use  of  x  and  {n  ],..., «^)  is  a  def-clear  loop-free  path  w.ith  respect  to  x. 

An  association  is  a  definition-c-use  association,  a  definition-p-use  association,  or  a  du- 
path. 

Recall  that  a  complete  path  is  a  path  from  the  entn,'  node  to  the  exit  node  of  the  flow 
graph.  A  complete  path  n  covers  a  definition-c-use  association  (i,j,x)  [respectively  a 
definition-p-use  association  (i,(j.k),x)]  if  it  has  a  definition  clear  subpath  with  respect  to  x 
from  i  to  j  [respectively,  from  i  to  (j,k)l.  n  covers  a  du-path  n'  if  t:'  is  a  subpath  of  K.  A 
set  n  of  complete  paths  covers  an  association  if  some  element  of  the  set  does.  A  test- 
set/driver-program  pair  (D.T)  covers  an  association  if  when  input  to  D  the  elements  of  T 
cause  the  execution  of  the  set  of  paths  H,  and  H  covers  the  association. 

Roughly  speaking,  the  family  of  data  flow  testing  criteria  is  based  on  requiring  that 
the  test  data  execute  definition  clear  paths  from  each  node  containing  a  global  definition 
of  a  variable  to  specified  nodes  containing  global  c-uses  and  edges  containing  p-uses  of 

definition  clear 


that  variable.   For  each  variable  definition  we  can  demand  that      ^ 

isome_ 


all 
some 


paths  with  respect  to  that  variable  from  the  node  containing  the  definition  to 

reachable  by  some  such  path  be  executed.   The  criteria  are  defined  pre 
cisely  in  Figure  2.1.3. 


of 


the 


uses 

c  -uses 

_p  -uses. 


If  variable  x  is  has  a  global  definition  in  node  i,  the  all-defs  criterion  requires  the  test 
data  to  exerci.se  some  path  which  goes  from  node  i  to  some  node  or  edge  at  which  the 
value  assigned  to  x  in  node  i  is  used.  The  all-uses  criterion  requires  the  test  data  to  exer- 
cise at  least  one  path  to  each  such  node  or  edge.  The  all-du-paths  criterion  requires  that 
all  of  the  du-paths  from  i  to  each  such  node  or  edge  be  exerci.sed.  The  criteria  all-p-uses, 


THE  DATA  FLOW  TESTING  CRITERIA 

A  test-set/driver-program  pair  (T,D;  satisfies  criterion  C  for  subprogram  P  if  and  only  if 
for  each  node  i  in  P's  flow  graph  and  each  x  €  defii)  the  set  n  of  paths  executed  by  T 
covers  the  following  associations: 


CRITERION 

ASSOCIATIONS  REQCIRED 

All-defs 

Some  (i,j,xj  s.t.  jedcu(x,i)  or 
some             (i,(j,k),x)             s.t. 
(i.k)edpu(x,i). 

All-c-uses 

All  (i.j.x)  s.t.  jedcu(x,i). 

A!l-p-uses                                                          All  (i.(j.kl.\)  s.t.  ( j,k  )6  dpu(\.i  i. 

All-p-uses/some-c-uses 

All  (i,(J,k).x)  s.t.  (j,k)€dpu(x.i). 
In  addition,  if  dpu(x,i)=0  then 
some     (i,j.x)    s.t.    jedcuCx.i). 
Note  that  since  i  has  a  global 
definition  of  x,  dpu(x,i)=0  => 
dcu'x.i)*0. 

All-c- use  s/some-p- uses 

All    (i,j.x)    s.t.   jedcu(x,i).     In 
addition,    if    dcu(x,i)=0    then 
some            (i,(j,k),x)            s.t. 
(j.k)€dcu(x,i).    Note  that  since 
i  has  a  global  definition  of  x. 
dcu(x.i)=0  =>  dpu{x.i)*:0. 

All-uses 

All  (i,j,x)  s.t.  j  e  dcu(x,i)  and 
all  (i,(i.k).x)  s.t.  (i.k)edpu(x.i). 

AU-du-palhs 

All  du-paths  from  i  to  j  with 
respect  to  x  for  each  jedcu(x,i) 
and  all  du-paths  from  i  to  (j,k) 
with    respect    to    x    for    each 
(j.k)€dpu(\,i). 

For  comparison  we  also  define  the  criteria  all-nodes  (all-edges,  all-paths,  respectively) 
which  require  that  fl  cover  every  node  (every  edge.  ever\  path,  respectively)  in  the  flow 
graph. 
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all-c-uses,  all-p-uses/some-c-uses,  and  all-c-uses/some-p-uses  place  emphasis  on  either 
c-uses  or  p-uses.  Note  that  any  subprogram  has  only  finitely  many  definition-use  associa- 
tions, so  none  of  the  DF  criteria  requires  an  infinite  amount  of  test  data.  Upper  bounds 
on  the  amount  of  test  data  required  by  the  DF  criteria  are  established  in  [WEY84]. 

Criterion  C  \  includes  criterion  C2  if  and  only  if  for  even,'  subprogram,  any  test- 
set/driver-program  pair  which  satisfies  C\  also  satisfies  C^-  Criterion  C\  strictly 
includes  criterion  C2,  denoted  C \  =>  C2,  if  and  only  if  C \  includes  C 2  and  for  some 
subprogram  P  there  is  a  test-set/driver- program  pair  which  satisfies  C2  but  does  not 
satisfy  C  \ .  The  notion  of  subsumption  in  [CLA85]  is  similar  to  the  notion  of  inclusion. 

Rapps  and  Weyuker  proved  that  for  the  simple  language  for  which  DF  testing  was 
originally  defined,  the  relationship  among  the  criteria  is  as  shown  in  Figure  2.1.4 
[RAP85].  Clarke  et.  al.  [CLA86]  have  shown  the  relationship  of  the  criteria  defined  by 
Laski  and  Korel  [LAS 83 J  and  Ntafos  [NTA84]  to  the  DF  criteria. 

In  extending  the  theor\-  of  DF  testing  to  apply  to  programs  written  in  Pascal  we 
have  preserved  the  inclusion  relations  among  the  DF  criteria.  Doing  so  required  the 
inclusion  of  definitions  of  all  non-local  variables  in  the  entn.'  node  of  the  procedure  and 
careful  treatment  of  implicit  uses  of  the  variable  inpuA .  For  symmetry,  we  have  also 
added  the  all-c-uses  criterion,  which  was  not  defined  in  [RAP85]. 

THEOREM  2.1.1: 

The  relationship  between  the  data  flow  testing  criteria  is  as  shown  in  Figure  2.1.4. 
That  is,  DF  criterion  CI  includes  criterion  C2  if  and  only  if  the  inclusion  is  explicitly 
shown  in  the  figure,  or  if  it  follows  from  the  transiti\ity  of  the  relation. 

PROOF: 
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Rapps  and  Weyuker  proved  a  theorem  analogous  to  this  one  for  the  data  flow  testing 
criteria  as  originally  defined  in  [RAP85].  Some  minor  modification  of  their  proof  is 
needed  to  account  for  the  fact  that  our  flow  graphs  differ  from  those  used  by  Rapps  and 
Weyuker.  For  symmetry,  we  have  also  added  the  all-c-uses  criterion,  which  was  omitted 
from  the  original  family  of  criteria. 

The  proofs  of  all  of  the  inclusions  except  (all-p-uses  includes  all-edges)  are  obvious 
and  will  be  omitted.  The  proofs  of  strictness  of  inclusions  and  of  incomparabiliiy  are 
essentially  the  same  as  the  corresponding  proof  in  [RAP85].  The  same  programs  and  test 
sets  used  in  that  proof  can  be  used  here.  Although  there  are  some  minor  differences  in 
the  form  of  the  flow  graphs  corresponding  to  these  programs,  the  proofs  go  through. 

It  remains  to  show  that  all-p-uses  =>  all-edges,  that  all-c-uses/some-p-uses  =>  all-c- 
uses,  and  that  all-c-uses  is  incomparable  w  ith  the  other  criteria. 

Proof  that  all-p-uses  =>  all-edges: 

Let  T  be  a  test  set  which  satisfies  all-p-uses  for  P  and  let  Y\  be  the  set  of  paths 
through  P"s  flow  graph  which  are  executed  by  T.  Let  (i.j)  be  an  edge  in  the  flow  graph. 

case  1:  (i,j)  is  labelled. 

Then  since  P  satisfies  the  NSUP  property,  there  is  a  definition-use  association 
(k,(i,j),x)  for  some  variable  x  and  node  k.  Since  T  satisfies  all-p-uses,  H  covers  (k,(i,j),x), 
so,  n  covers  edge  (i,j). 

case  2:  (i.j)  is  not  labelled. 

case  2a:  There  only  one  path  from  the  entr}'  node  to  edge  (i,j). 

Then  every  path  in  n  covers  edge  (i.j).  Since  P  satisfies  NSUP  and  NSL.  FI  is  not 
empty,  so  there  is  at  least  one  such  path.  Thus  P  covers  edge  (i,j). 
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case  2b:  There  is  more  than  one  path  from  the  entr\'  node  to  edge  (i,j). 
Let  n  be  the  unique  node  such  that 

1)  n  has  more  than  one  successor,  and 

2)  there  exists  exactly  one  acyclic  path  from  node  n  to  edge  (ij). 

Let  m  be  the  unique  successor  of  n  such  that  there  is  an  acyclic  path  from  n  to  m  to  (i,j). 
Then  ever>'  path  uhich  covers  edge  (n.m)  covers  edge  (i,j).  Since  n  has  more  than  one 
successor,  edge  (n,m)  is  labelled,  so  (by  NSUP),  there  is  a  def-p-use  association 
{k,(n,m),x).  n  covers  (k,(n,m),x),  hence  covers  (n,m),  hence  covers  (i,j). 

FYoof  that  all-c-uses/some-p-uses  =>  all-c-uses. 

It  is  obvious  that  all-c-uses/some-p-uses  includes  all-c-uses.  To  see  that  the  inclu- 
sion is  strict,  consider  the  program  whose  flov.  graph  is  shoun  in  Figure  2.1.5.  The  only 
definition-c-use  associations  in  this  program  are  (4,8,x)  and  (l,10,inputT).  The  set 
T={(0,0))  satisfies  all-c-uses  for  this  program,  but  it  does  not  cover  either  of  the 
definition-p-use  associations  (3,(6.7).z)  or  (3,(6,8),z),  hence  does  not  satisfy  all-c- 
uses/some-p-uses. 

Proof  that  all-c-uses  is  incomparable  with  the  other  criteria: 

The  set  T=((0,0;)  satisfies  all-c-uses  for  the  program  shown  in  Figure  2.1.5.  but 
does  not  satisfy  all-p-uses/some-c-uses.  all-p-uses,  all-edges,  all-nodes,  or  all-defs,  so 
all-c-uses  does  not  include  any  of  these  criteria. 

To  see  that  none  of  the  criteria  all-p-uscs/some-c-uses.  all-p-uses.  all-edges,  all- 
nodes,  or  all-dcfs  include  ihc  all-c-uscs  criterion,  consider  the  program  shown  in  Figure 
2.1.6  The  test  set  |(-1V(5))  satisfies  the  criteria  all-p-uses/some-c-uses.  all-p-uses.  all- 
edges,  all-nodes,  and  all-defs.    However,  it  does  not  cover  the  def-c-usc  association 
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(4,7, x),  and  hence  does  not  satisfy  the  all-c-uses  criterion. ■ 

2.  Axiomatic  Evaluation  of  Data  Flou  Testing  Criteria 

Weyuker  has  defined  eight  axioms  which,  she  argues,  should  be  satisfied  by  a 
"good"  adequacy  criterion  [WEY86].  In  this  section  we  explore  which  of  these  axioms 
are  satisfied  by  the  data  flow  testing  criteria. 

Weyuker's  model  of  computation  assumes  that  the  objects  being  tested  are  single- 
entry,  single-exit  programs  which  take  all  of  their  inputs  at  the  beginning  and  which  have 
a  single  output  statement  at  the  end.  We  v.ill  make  certain  minor  modifications  in  the 
axioms  in  order  to  make  them  apply  to  testing  Pascal  subprograms. 

Axiom  1:  Applicability 

For  every  subprogram,  there  exists  an  adequate  test  set. 

None  of  the  data  flow  testing  criteria  satisfy  the  applicability  axiom.  Consider  the 
following  program,  whose  flov.  graph  is  shown  in  Figure  2.2.1. 

program  Applicability_CounterExample: 

var  x,y,z:  integer; 

begin 

read(x.y,z); 
if  x>Othen 
begin 

v:=0; 
z:=0; 
end; 

if  X  <0  then 
begin 

WTiteln(y): 

if  z>()  then  writeln('hello') 
end, 
end. 

There  are  no  executable  paths  which  cover  the  association  f3.8.y).  which  is  required 
by  the  all-defs  and  all-c-uses  criteria,  nor  the  associations  (3,(8,9).z),  or  (3,(9, 10), z), 
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which  are  required  by  the  all  p-uses  criterion.  Therefore  no  test  set  is  C-adequate  for 
testing  this  program,  for  any  data  flow  criterion  C.  Note  that  this  program  does  not  have 
any  unexecutable  code. 

While  the  above  program  was  contrived  to  demonstrate  that  none  of  the  criteria 
satisfy  the  applicability  axiom,  there  are  many  "realistic"  subprograms  for  which  no  set 
of  test  data  is  C-adequate  when  C  is  all-p-uses,  all-c-uses,  all-c-uses/some-p-uses,  all-p- 
uses/some-c-uses.  all-uses,  or  all-du-paths.  For  example,  any  subprogram  having  a  for 
loop  in  which  the  upper  bound  is  always  greater  than  or  equal  to  the  lower  bound  has  a 
def-p-use  association  which  is  not  covered  by  any  executable  path.  In  Figure  2.2.2,  the 
only  executable  path  from  the  definition  of  i  in  node  1  to  the  p-use  of  i  on  edge  (2,4)  is 
the  path  which  goes  around  the  loop  5  times.  Since  this  path  has  a  definition  of  i,  we  can 
conclude  that  no  executable  path  covers  the  definition-p-use  association  (1,(2,4), i). 

Axiom  2:  Non-Exhaustive  Applicability 

There  is  a  subprogram  P  and  test  set  T  such  that  P  is  adequately  tested  by  T  and  T  is  not 
an  exhaustive  test  set. 

All  of  the  data  flow  testing  criteria  satisfy  this  axiom.  Consider  the  following  pro- 
gram whose  flow  graph  is  shown  in  Figure  2.2.3. 

program  Non-exhaustive_Applicability( input, output); 

var  X  :  integer; 

begin 

read(x); 

if  x>0  then  wnteln('hello') 

else  writeln('goodbye') 
end 

The  test  set  T={(0),(1))  is  C-adequate  for  testing  this  program,  where  C  is  any  data  flow 
testing  criterion.  On  the  other  hand,  the  exhaustive  te.st  set  for  this  program  is  the  set  of 
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iniegers. 

Axiom  3:  Monotonicity 

If  T  is  adequate  for  P  and  TqT'  then  T'  is  adequate  for  P. 

It  is  obvious  that  all  of  the  data  flov.  criteria  satisfy  the  monotonicity  property  since 
all  of  the  associations  covered  by  T  are  also  covered  by  7"'. 

Axiom  4:  Inadequate  Empty  Set 

The  empty  set  is  not  adequate  for  any  subprogram. 

Since  every  subprogram  in  the  class  we  are  considering  satisfies  the  No-Syntacric- 
Undefined-P-uses  Propeny  and  the  Non-straight-line  Property,  each  subprogram  has  at 
least  one  definition-p-use  association.  It  follows  that  the  empty  set  is  not  adequate  for 
any  of  the  criteria  all-p-uses.  all-p-uses/some-c-uses.  all-c-uses/some-p-uses,  all-uses,  or 
all-du-paths. 

Since  even,'  subprogram  has  a  definition  of  inpuA  in  the  entry  node  and  a  c-use  of 
inpui^  in  the  exit  node,  the  all-c-uses  criterion  satisfies  the  inadequate  empty  set  axiom. 
Note,  however,  that  the  fact  that  all-c-uses  satisfies  this  axiom  can  be  construed  as  being 
an  anifact  of  the  definitions. 

.Axiom  5:  Aiili-tAliiisioii.ilitx 

There  arc  subprograms  P  and  Q  such  that  P  is  equivalent  to  Q,  T  is  adequate  for  P.  but  T 
is  not  adequate  for  Q. 

All  of  the  data  flow  testing  criteria  satisfy  the  Anti-extensionality  propeny.  Con- 
sider the  program*;  P  and  Q  uhosc  flow  graphs  are  shown  in  Figure  2.2.4.  Both  of  ihcsc 
programs  lake  as  input  a  single  integer,  x.  and  output  the  absolute  value  of  x    The  test  set 
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T={(-1),(0))  is  C-adequate  for  P  for  any  data  flow  testing  criterion  C.  However,  since  T 
does  not  cover  node  3  in  Q,  it  does  not  satisfy  the  all-nodes  criterion,  hence  does  not 
satisfy  all-p-uses,  all-p-uses/some-c-uses,  all-uses,  or  all-du-paths  for  Q.  T  does  not 
cover  the  def-c-use  association  (4,5, y)  in  Q,  which  is  required  by  the  all-defs  criterion, 
and  hence  does  not  satisfy  the  criteria  all-defs,  all-c-uses,  all-c-uses/some-p-uses  for  Q. 
So  while  T  satisfies  all  of  the  criteria  for  P,  it  satisfies  none  of  them  for  the  equivalent 
program  Q. 

Axiom  6:  General  Multiple  Change 

There  are  subprograms  P  and  Q  which  are  the  "same  shape",  and  a  test  set  T  such  that  T 
is  adequate  for  P,  but  T  is  not  adequate  for  Q.  (Roughly  speaking.  P  and  Q  are  the  same 
shape  if  P  can  be  transformed  into  Q  by  performing  one  or  more  simple  transformations 
such  as  replacing  a  relational  operator  with  another  relational  operator,  replacing  a  con- 
stant with  another  constant,  etc.) 

All  of  the  data  flow  testing  criteria  satisfy  this  axiom.  Consider  again  the  programs 
in  Figure  2.2.4.  These  programs  are  the  same  shape  because  Q  is  obtained  from  P  by 
changing  the  relational  operator  in  an  if-then-else  statement.  As  shown  above.  The  test 
set  T=((-1).(0))  satisfies  all  of  the  data  flow  testing  criteria  for  P,  but  satisfies  none  of 
the  criteria  for  Q,  Note  that  when  considering  the  anti-cxtcnsionality  axiom,  the  relevant 
feature  of  programs  P  and  Q  was  their  semantic  closeness,  but  when  considering  the  gen- 
eral multiple  change  axiom,  the  relevant  feature  is  their  syntactic  closeness. 

Avioni  7:  Anliditoniposilion 

There  exists  a  subprogram  P  and  a  "component"  Q  such  thai  T  is  adequate  for  P.  T'  is  the 
set  of  vectors  of  values  that  that  variables  can  ussutnc  u:i  ciiujikc  iv  <,>  Iit  some  t  of  T, 
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and  T'  is  not  adequate  for  Q. 

A  component  of  P  is  a  contiguous  sequence  of  statements  of  P.  Since  our 
definitions  of  the  data  flow  testing  criteria  apply  to  testing  a  subprogram,  not  a  com- 
ponent, we  will  modify  the  axiom  slightly.  The  modified  axiom  captures  the  intuition 
behind  Weyuker's  axiom,  but  fits  our  model  of  computation. 

Antidecomposition : 

There  is  a  subprogram  P  which  calls  a  subprogram  Q  and  a  test  T  such  that  T  is  ade- 
quate for  the  program  P'  obtained  from  P  by  expanding  in-line  one  call  to  Q,  but 
the  test-set/driver-program  pair  (T,P)  is  not  adequate  for  Q. 

None  of  the  data  flow  testing  criteria  except  all-c-uses  and  all-c-uses/some-p-uses 
satisfy  the  antidecomposition  axiom.  The  fact  that  all-c-uses  and  all-c-uses/some-p-uses 
do  satisfy  the  axiom  is  an  artifact  of  the  definitions,  which  we  will  discuss  below. 

With  certain  exceptions  which  we  will  examine  below,  each  definition-use  associa- 
tion (3  in  Q  corresponds  to  a  definition-use  association  a'  in  the  ponion  of  P'  which 
corresponds  to  Q.  The  test  case  which  covers  a'  when  input  to  P'  will  also  cover  a  when 
P  is  run. 

For  example,  consider  the  following  programs,  whose  flow  graphs  are  shown  in  Fig- 
ure 2.2.5. 


program  P(input.output): 

var  X  :  integer; 

procedure  Q(x:  integer); 

begin 

if  x>0  then  writelnChcllo') 
else  writeln( 'goodbye"; 

end:  (Q) 

begin  IP) 

read(x): 

Q(x); 
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write(x) 


end. 


program  /"(input, output); 
var  x,y,tl  :  integer; 
begin  (P) 

read(x); 

(in-line  expansion  of  call  to  Q  begins} 

tl  :=  x; 

if  tl>Othen  wTiteln('hello') 

else  writelnC 'goodbye') 

{in-line  expansion  of  call  to  Q  ends) 

write(x) 
end. 

The  association  (6,(7,8),x)  in  Q,  which  arises  from  the  fact  that  the  parameter  x  has  a 
definition  in  node  1  and  a  p-use  on  edge(7,8),  corresponds  to  the  association 
(13,(13, 14),tl)  in  P'.  Both  are  covered  by  any  test  case  consisting  of  an  integer  which  is 
greater  than  or  equal  to  0. 

The  exceptions  arise  due  to  the  fact  that  we  have  given  the  exit  node  of  a  procedure 
a  c-use  of  each  var  parameter  and  each  global  variable.  If  some  global  variable  (or  var 
parameter)  has  a  definition  in  Q  which  reaches  the  exit  node  of  Q  and  if  that  variable  (or 
the  actual  parameter  corresponding  to  the  var  formal  parameter)  is  never  used  at  any 
point  in  P  which  is  reachable  from  the  call  to  Q.  then  Q  will  have  a  definition-c-use  asso- 
ciation which  does  not  correspond  to  any  definiiion-c-use  association  in  /". 

For  example,  consider  the  programs  whose  flow  graphs  are  shown  in  Figure  2.2.6. 
Let  y  be  a  global  variable  Procedure  Q  has  a  definition  of  y  in  node  9  which  reaches  the 
exit  node  of  Q  and  which  is  never  used  in  the  calling  program,  P.  The  test  set  T={(0))  is 
adequate  for  P'  according  to  the  all-defs.  all-c-uses.  and  all-c-uses/somcp-uses  criterion. 
However.  1  is  not  adequate  for  Q  according  to  any  of  these  criteria.   Because  it  is  non- 
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local  to  Q,  y  has  a  c-use  in  node  1 1,  the  exit  node  of  Q.  To  satisfy  all-defs,  all-c-uses,  or 
all-c-uses/some-p-uses  for  Q  a  test  set  must  cover  the  definition  c-use  association 
(9,1  l,y),  thus  must  include  an  element  (Y),  where  Y  is  strictly  less  than  zero. 

Note  that  the  fact  that  these  criteria  satisfy  the  antidecomposition  axiom  is  an 
artifact  of  the  definitions.  If  the  model  of  data  flow  were  based  on  accurate  interpro- 
cedural  data  flow  analysis,  or  if  the  class  of  subprograms  were  restricted  to  exclude  those 
having  data  flow  anomalies  like  the  one  in  the  program  in  Figure  2.2.6  then  the  criteria 
would  fail  to  satisfy  the  axiom. 

Axiom  8:  Anticomposition 

There  exist  subprograms  P  and  Q  and  test  set  T  such  that  T  is  adequate  for  P  and  P(T)  is 
adequate  for  Q,  but  T  is  not  adequate  for  P;Q. 

Weyuker's  model  of  computation  assumes  that  all  input  statements  appear  at  the  begin- 
ning of  a  program,  and  that  all  output  statements  appear  at  the  end  of  a  program.  "P;Q" 
denotes  the  program  obtained  from  two  programs  P  and  Q.  which  use  the  same  set  of 
identifiers,  by  replacing  P's  unique  output  and  exit  statements  by  Q'  where  Q'  is 
obtained  from  Q  by  deleting  the  input  statements.  We  will  modify  the  Anticomposition 
axiom  slightly  obtain  an  axiom  which  captures  the  intuition  motivating  Weyuker's 
definition,  but  which  fits  our  model  of  computation. 

Anticomposition: 

There  is  a  subprogram  R  whose  statement  part  consists  of  a  call  to  procedure  P  fol- 
lowed immediately  by  a  call  to  procedure  Q.  and  a  test  set  T  for  R  such  that  the 
test-sei/driver-program  pair  (T.R)  is  adequate  for  P  and  for  Q.  but  T  is  not  adequate 
for  the  subprogram  R'  obtained  from  R  by  expanding  P  and  Q  in-line. 
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All  of  the  data  flow  testing  criteria  satisfy  the  Anticomposition  axiom.  To  see  that  the 
criteria  all-defs,  all-c-uses,  all-c-uses/some-p-uses,  all-uses,  and  all-du-paths  satisfy  the 
axiom,  consider  the  following  programs,  whose  flow  graphs  are  shown  in  Figure  2.2.7 


program  R(input,output); 
var  x,y,z:  integer; 
procedure  P; 
begin  (P) 
y:=0; 
read(x): 

ifx>=Othen  v:=  1; 
end;   (P) 
procedure  Q; 
begin  (Q) 

read(z); 

If  z  <  0  then  writeln(v); 
end;   {P) 
begin  (R) 
P; 

0 
end 


program  /?'(input,ourput); 
var  x,y,z:  integer; 
begin!/?') 

y:=0; 

read(x); 

if  X  >=0  then  y.=  1; 

read(z); 

if  z  <  0  then  wriieln(y;; 
end 

Let  T=|(-1.-1),(0.0)).  Then  the  test  set/driver  program  pair  (T,R)  satisfies  the  all  of  the 
data  flow  testing  criteria  for  P  and  for  Q.  However.  T  docs  not  satisfy  the  cnteria  all- 
dcfs.  all-c-uses.  all-c-uses/some-p-uses,  all-uses,  or  all-du-paths  for  R'  To  do  so,  T 
would  have  to  cover  the  dcfiniiion-c-usc  association  ( 16.19.y).  That  is,  T  would  have  to 
include  an  clcmcni  (a.h)  where  a>()  and  b<0.  A  similar  example  shows  that  the  criieria 
all-p-uscs  and  all-p-uses/some-c-uscs  also  satisfy  the  anticomposition  axiom. 
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Summary 

The  above  results  are  summarized  in  the  table  shown  in  Figure  2.2.8. 


Axioms 

Criteria 

1 

2 

3 

4 

5 

6 

7 

8 

all-defs 

no 

ves 

ves 

ves 

ves 

ves 

no 

ves 

all-c-uses 

no 

ves 

ves 

ves 

ves 

ves 

ves 

ves 

all-p-uses 

no 

ves 

ves 

ves 

ves 

ves 

no 

ves 

all-c-uses/ 
some-p-uses 

no 

yes 

yes 

yes 

yes 

yes 

yes 

yes 

all-p-uses/ 
some-c-uses 

no 

yes 

yes 

yes 

yes 

yes 

no 

yes 

all-uses 

no 

ves 

ves 

ves 

ves 

ves 

no 

ves 

all-du-paths 

no 

ves 

ves 

ves 

yes 

ves 

no 

ves 

FIGURE  2.2.8 
Summary  of  .Axiomatic  Evaluation  of  the  Data  Flow  Testing  Criteria 

The  fact  that  none  of  the  data  flow  testing  criteria  satisfy  the  applicability  axiom  is  disap- 
pointing, though  hardly  surprising.  Statement  testing,  branch  testing,  and  path  testing 
also  fail  to  satisfy  the  applicability  axiom,  for  essentially  the  same  reason.  All  of  these 
criteria  base  the  choice  of  paths  which  the  test  data  must  execute  solely  on  the  syntax  of 
the  subprogram  being  tested.  If  some  required  path  is  unexecutable  then  the  subprogram 
cannot  be  adequately  tested. 

One  could  argue  that  the  subprograms  which  cannot  be  adequately  tested  according 
to  the  statement  testing  or  branch  testing  criteria  are  in  some  sense  pathological  because 
they  have  unexecutable  statements  or  unexecutable  branches.  (.On  the  other  hand,  one 
could  argi'"  that  there  are  good  reasons,  arising  from  "defensive  programming"  for  pro- 
grams to  have  unexecutable  code.)  Be  that  as  it  may,  the  situation  is  somewhat  different 
for  the  data  flow  testing  criteria.  There  arc  programs  which  have  no  unexecutable  state- 
ments or  branches  yet  which  cannot  be  adequately  tested  by  the  data  flow  testing  criteria. 
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Practitioners  generally  dodge  the  issue  of  inapplicable  criteria  by  eliminating  from 
consideration  those  statements,  branches,  or  def-use  associations  which  cannot  be  exe- 
cuted. Since  it  is  undecidable  whether  a  given  statement,  branch  or  def-use  association 
can  be  executed,  this  process  cannot  be  fully  automated. 

In  chapter  four  we  will  define  a  new  family  of  adequacy  criteria  which  are  derived 
from  the  data  flow  testing  criteria  by  eliminating  unexecutable  associations  from  con- 
sideration. These  new  criteria  do  satisfy  the  applicability  axiom.  We  will  see  that  there 
are  some  significant  differences  between  the  formal  properties  of  this  family  and  those  of 
the  data  flow  testing  criteria. 
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CHAPTER  3: 
The  Design  and  Implementation  of  ASSET 

1.  Introduction 

We  have  built  a  software  testing  tool  called  ASSET  (A  System  to  Select  and  Evalu- 
ate Tests)  which  is  based  on  the  data  flow  testing  criteria.  In  this  chapter  we  will  discuss 
asset's  design  and  implementation. 

ASSET  takes  as  input  a  Pascal  program  P,  the  name  of  a  subprogram  Q  of  P,  a  set 
of  test  data,  and  one  of  the  data  flow  testing  criteria.  It  runs  the  program  P  on  the  test 
data  and  produces  a  list  of  those  definition-use  associations  in  Q  which  are  required  by 
the  given  criterion  but  not  covered  by  the  test  data. 

The  user  then  examines  the  output  to  determine  whether  the  program  has  behaved 
according  to  its  specification.  If  the  program  computed  the  wrong  output,  the  user 
repons  that  the  program  has  a  error.  If  the  program  has  behaved  correctly  on  all  of  the 
test  cases,  and  if  the  criterion  has  not  been  satisfied,  the  user  selects  additional  test  data 
and  continues  the  testing  process.  If  the  program  has  behaved  correctly  and  the  criterion 
has  been  satisfied  the  user  can  either  release  the  program,  certifying  that  the  subprogram 
Q  has  been  adequately  tested  according  to  the  criterion,  or  can  select  another  criterion 
and  continue. 

In  section  2  we  describe  ASSET'S  design  and  in  section  3  we  describe  the  imple- 
mentation of  the  current  version  of  ASSET,  which  is  an  experimental  prototv'pe.  A  script 
of  an  ASSET  session  can  be  found  in  section  6  of  the  ASSET  User  manual,  which  is 
included  as  Appendix  I. 
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2.  The  Design  of  ASSET 

ASSET  is  an  interactive  system  consisting  (primarily)  of  five  modules  which  com- 
municate with  each  other  via  files.  Execution  of  the  modules  is  invoked  by  commands 
issued  by  the  user,  via  a  user  interface.  The  modules  are  (A)  Front  End,  (B) 
Association-Finder,  (C)  Compiler,  (D)  Tester,  and  (E)  Checker.  The  front  end  is  the  only 
module  which  is  concerned  with  the  syntax  of  the  language  in  which  the  program  being 
tested  is  written.  Thus,  extending  ASSET  to  handle  programs  written  in  other  languages 
should  be  fairly  straight  forward. 

In  this  section  we  describe  each  of  these  modules  and  their  interaction  with  one 
another.  Figure  3.2.1  is  a  schematic  diagram  of  ASSET'S  design.  We  conclude  this  sec- 
tion with  a  discussion  of  some  of  the  limitations  of  this  design. 

Module  A:  The  Front  End 

This  module  is  similar  to  the  front  end  of  a  compiler.  It  takes  as  input  a  syntacti- 
cally correct  program  (written  in  the  subset  of  Pascal  described  in  Chapter  2)  P,  which 
we  will  call  the  subject  program,  and  a  subprogram  Q  of  P,  which  we  will  call  the  subject 
procedure.  It  produces  a  symbol  table  with  information  on  the  identifiers  visible  to  the 
statement  part  of  Q,  the  flow  graph  of  Q,  a  table  called  the  def-use  table,  and  a  program 
called  the  modified  subject  program.  The  def-use  table  lists  the  definitions  and  c-uses  in 
each  node  and  p-uses  on  each  edge.  The  modified  subject  program  is  a  program  which, 
for  each  test  case  t,  simulates  the  behavior  of  P  on  input  t,  and  which  also  produces  a 
trace  of  the  path  through  Q's  flow  graph  which  is  executed  when  P  is  run  with  input  t. 

To  do  this,  the  front  end  must  perform  lexical  analysis,  parse  all  declarations  which 
are  visible  to  the  statement  part  of  Q,  and  (at  least  partially)  parse  the  statement  part  of  Q. 
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The  Design  of  ASSET 
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Q  is  divided  into  basic  blocks  and  the  flow  graph  is  built  according  to  the  rules  in  Figure 
2.1.1.  Each  node  in  the  flow  graph,  or  equivalently  each  basic  block  (including  certain 
empty  basic  blocks),  is  given  a  unique  number. 

The  statements  in  each  basic  block  b  of  Q  are  examined.  The  sequence  of 
definitions  and  c-uses  of  variables  occurring  in  the  b  are  recorded  and  the  p-uses  of  vari- 
ables on  edges  whose  source  is  b  are  recorded  according  to  the  rules  in  Figure  2.1.1.  It  is 
not  necessary  to  completely  parse  the  expressions  occurring  in  Q,  only  to  identify  the 
variables  occurring  in  each  expression. 

The  modified  subject  program  is  obtained  from  P  by  adding  a  probe  at  the  begin- 
ning of  each  basic  block  in  Q,  which  when  executed,  will  write  the  block's  number  to  a 
file  called  the  program  trace.  A  probe  is  which  appends  a  zero  to  the  program  trace  each 
time  a  call  to  Q  is  about  to  terminate  is  also  added  to  the  modified  subject  program. 
Extra  begins  and  ends  are  added  as  needed.  In  order  to  place  a  probe  in  every  block, 
repetitive  statements  are  translated  into  a  form  which  involves  gotos  and  labels.  For 
example,  the  statement 

while  X  >  0  do  writeln(x): 

in  the  subject  procedure  would  be  translated  into  a  fragment  of  the  form 

Lx:     probe(«i); 

if  not  (x  >  0)  then  goto  L2; 

probe(/i2); 
writeln(x); 
gotoLi; 
L2:     probe(n3): 
probe(n4); 

where  Li  and  L2  are  statement  labels,  the  n,  are  node  numbers,  and  the  probes  write  the 
node  numbers  to  the  program  trace  file.  The  modified  subject  program  must  also  include 
declarations  of  the  program  trace  file,  of  labels,  of  temporary  variables  introduced  while 
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translating  for  loops,  and  of  any  other  identifiers  which  may  appear  in  the  probes. 

Module  B:  Association-Finder 

This  module  takes  as  input  the  def-use  table  and  flow  graph  produced  by  the  front 
end,  and  one  of  the  data  flow  testing  criteria  and  produces  a  list  of  all  of  the  definition- 
use  associations  (of  the  appropriate  kind)  in  subprogram  Q.  If  the  criterion  is  all-defs, 
all-p-uses,  all-c-uses,  all-p-uses/some-c-uses,  all-c-uses/some-p-uses,  or  all-uses  then  the 
relevant  associations  are  definition-c-use  and/or  definition-p-use  associations.  If  the  cri- 
terion is  all-du-paths,  the  relevant  associations  are  du-paths.  Since  the  number  of  du- 
paths  may  be  exponential  in  the  number  of  nodes  in  the  flow  graph,  ASSET  does  not 
prepare  a  list  of  them  unless  necessary. 

There  are  several  possible  ways  to  build  the  list  of  definition-c-use  and  definition- 
p-use  associations.  The  problem  can  be  viewed  as  finding  all  of  the  definitions  which 
reach  each  use  of  a  variable  (use-definition  chaining)  or  as  finding  all  of  the  uses  which 
can  be  reached  by  each  definition  of  a  variable  (definition-use  chaining).  There  are 
several  well-known  polynomial-time  algorithms  for  these  problems  [AH086,HEC77). 

Finding  all  of  the  du-paths  in  Q  has  an  exponential  lower  bound.  This  is  because 
the  number  of  du-paths  is  potentially  exponential,  and  each  of  the  du-paths  must  be 
traversed  explicitly. 

Module  C:  Compiler 

This  module  is  an  ordinary  Pascal  compiler.  In  ASSET,  it  takes  the  modified  sub- 
ject program  as  input  and  produces  an  object  file  which  we  will  call  the  modified  object 
code. 
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Module  D:  Tester 

This  module  takes  as  input  the  modified  object  code,  and  a  set  T  of  test  data.  For 
each  test  case  t  in  T,  it  executes  the  modified  program  with  t  as  an  input.  This  produces 
the  output  P(t)  which  would  be  produced  by  running  P  on  t,  and  also  appends  the  path 
(or  paths)  executed  through  Q  to  the  program  trace  file. 

Module  E:  Checker 

This  module  checks  which  of  the  associations  required  by  the  criterion  have  been 
covered  by  the  test  set  T.  It  then  determines  whether  the  criterion  has  been  satisfied  and, 
if  not,  outputs  a  list  of  associations  which  required  by  the  criterion  but  which  have  not 
been  covered. 

We  associate  with  each  definition-c-use  association  and  each  definition-p-use  asso- 
ciation a  regular  expression  which  describes  the  set  of  paths  covering  the  association. 
Let  A  be  the  alphabet  { 1,...,N}  where  Q's  flow  graph  has  N  nodes.  For  each  variable  v, 
let  D'^^A  be  the  set  of  node  numbers  corresponding  to  nodes  which  have  a  global 
definition  of  v.  Let  (d,u,v)  be  a  def-c-use  association.  The  regular  expression 
r'^"'^=A*d(A-D^)*uA*  describes  the  set  of  paths  which  cover  the  association  in  the 
sense  that  if  7t  is  a  path  from  the  entry  to  the  exit  of  Q  then  k  covers  (d,u,v)  if  and  only  if 
71  belongs  to  the  language  generated  by  r'^"-'^.  Similarly,  the  regular  expression 
/•'^•^•'•^=A*d(A-D^)*stA*  describes  the  set  of  paths  which  cover  the  def-p-use  associa- 
tion (d,(s,t),v). 

If  the  criterion  is  all-defs,  all-c-uses,  all-p-uses,  all-c-uses/some-p-uses,  all-p- 
uses/some-c-uses,  or  all-uses,  ASSET  simulates  the  operation  of  deterministic  finite  auto- 
mata which  correspond  to  these  regular  expressions.   By  submitting  each  of  the  paths 
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traversed  by  the  test  data  as  input  to  each  of  the  automata,  ASSET  determines  which  of 
the  def-c-use  and  which  of  the  def-p-use  associations  have  been  covered  by  the  test  set. 
The  algorithm  in  Figure  3.2.2  simulates  the  parallel  operation  of  the  automata.  Note  that 
only  one  pass  over  the  program  trace  file  is  needed  to  determine  which  of  the  def-c-use 
and  def-p-use  associations  are  covered  by  the  test  set. 

If  the  criterion  is  all-c-uses,  all-p-uses,  or  all-uses,  ASSET  then  simply  checks 
whether  all  of  the  automata  corresponding  to  the  appropriate  type  of  use  have  accepted  at 
least  one  input.  Those  which  have  not  accepted  correspond  to  associations  which  have 
not  been  covered.  If  the  criterion  is  all-defs,  all-c-uses/some-p-uses,  or  all-p-uses/some- 
c-uses,  then  a  little  more  bookkeeping  is  necessary.  For  example,  for  the  all-defs  cri- 
terion, it  is  necessary  to  check  whether  any  of  the  associations  arising  from  a  panicular 
definition  of  a  variable  have  been  covered. 

If  the  criterion  is  all-du-paths,  ASSET  uses  pattern-matching  to  determine  which 
du-paths  have  been  covered  by  the  test  set.  For  each  du-path  6  and  each  path  k  in  the 
program  trace,  ASSET  treats  7t  as  text,  and  5  as  a  pattern,  and  searches  the  text  to  deter- 
mine whether  the  pattern  occurs.  The  pattern  6  occurs  in  7t  if  and  only  if  the  path  n  cov- 
ers the  du-path  5.  By  using  a  pattern  matching  algorithm  which  never  has  to  back-up  in 
the  text  (such  as  the  Knuth-Morris-Pratt  Algorithm  [KNU77]).  and  by  checking  the  pat- 
terns represented  by  all  of  the  du-paths  in  parallel  (in  a  manner  similar  to  the  above  simu- 
lation of  the  parallel  execution  of  several  automata)  ASSET  can  check  all  of  the  du-paths 
in  a  single  pass  over  the  program  trace  file. 

Limitations 

One  limitation  of  this  design  is  that  the  program  traces  do  not  distinguish  between 
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const 


type 


var 


begin 


V=        number  of  variables; 

N  =        number  of  nodes  in  flow  graph; 

CU  =      number  ofdef-c-use  associations; 

PU  =       number  ofdef-p-use  associations; 

VariableRange:     1..V; 

NodeRange:         1..N; 

C_UseRange:       1..CU; 

P_UseRange:        1..PU; 

StaieType:  (ql,q2,q3); 

automaton  1  =       record 


v: 

VariableRange; 

d.u: 

NodeRange; 

state: 

StateType; 

accepted: 

boolean; 

end; 

automaton!  = 

record 

v: 

VariableRange; 

d.s.L- 

NodeRange; 

state: 

StateType; 

accepted: 

boolean; 

end; 

def_c_use: 

array  [C_UseRange]  of 

automaton  1 ; 

def_p_use: 

array  rP_UseRange]  of  automaton!; 

i: 

integer; 

fori:=l  toCUdo 

initialize  defcuseli]  so  that  the  fields  d.u.  and  v 
represent  the  node  with  the  definition,  node  with  the  c-use. 
and  variable,  respectively  in  the  /'*  def-c-use  association 
and  initialize  state  to  ql  and  accepted  to  false; 
fori:=l  toPUdo 

initialize  def_p_use[ij  so  that  the  fields  d.s,i.  and  v 

represent  the  node  with  the  definition,  source  and  target  of  the  edge  with  the  p-use, 
and  variable,  respectively  in  the  /**  def-p-use  association 
and  initialize  state  to  ql  and  accepted  to  false; 
read(program_trace,node); 
while  node  o  END_OF_PATH_MARKER  do 
begin  (process  next  node  in  the  path) 

for  i:=l  to  CU  do  (update  state  of/'*  "def-c-use"  automaton) 
with  def_c_use[i]  do 

if  not  accepted  then 
case  state  of 

ql:  if  node  =  d  then  state  :=q2; 
q2:  if  node  =  u  then  accepted  :=  true 
else  if  ((nodeeD*)  and  (node  o  d) 
then  state  :=  ql; 
end;  (case) 
end;  (for) 

for  i:=l  to  PU  do  (update  state  of  /'*  "def-p-use"  automaton) 
with  def_p_usc[i]  do 

if  not  accepted  then 
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if  (d  <>  s)  then 

case  state  of 

q  1 :  if  node  =  d  then  state  :=  q2; 

q2;  if  node  =  s  then  state  :=  q3 

else  if  ((nodee D")  and  (node  o  d)) 
then  state  :=  ql; 
q3:  if  node  =  l  then  accepted  :=  true 

else  if  node  =  s  then  state  :=  q3 

else  if  ((nodeeD')  and  (node  o  d)) 
then  state  :=  ql 

else  state  :=  q2; 
end  (case) 
else  {d  =  s) 
case  state  of 

ql :  if  node  =  d  then  state  :=  q3; 
q3:  if  node  =  t  then  accepted  :=  true 

else  if  (node  o  d)  then  state  :=  ql 
end;  {case] 


end;  (for) 
read(progTam_trace  ,node) ; 
end;  (while) 
end; 


Figure  3.2.2 
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different  invocations  of  the  same  procedure.  This  presents  certain  problems  when  the 
procedure  being  tested  calls  itself  recursively.  For  example,  consider  the  following  pro- 
gram: 

program  driver(input,output); 
var  a :  integer, 
procedure  Q(x  :  integer); 
var  y  :  integer; 
begin 

read(y); 

if  X  >=0  then 

begin 

Q(x-l); 
write(y) 
end 
end;  {Q) 
begin  {driver) 
read(a); 
Q(a) 
end. 

The  flow  graph  for  procedure  Q  is  shown  in  Figure  3.2.3.  The  test  set  T={(0))  executes 
the  path  (1,2,3,4,6,7)  through  its  first  invocation  of  procedure  Q,  and  the  path  (1,2,5,6,7) 
through  its  second  invocation  of  procedure  Q.  The  first  of  these  has  a  definition-clear 
subpath  with  respect  to  y  from  node  2  to  node  4,  hence  covers  the  def-c-use  association 
(2,4,y).  However,  the  design  of  ASSET  described  here  would  say  that  the  paths  executed 
were  (1,2,3,1,2,5,6,7)  and  (4,6,7).  Therefore  it  would  not  see  that  the  association  (2,4,y) 
had  been  covered.  It  is  left  to  the  user  to  determine  by  hand  that  the  association  has  been 
covered. 

A  more  sophisticated  design  could  circumvent  this  problem  by  dynamically  tagging 
the  node  numbers  in  the  program  trace  with  a  procedure-invocation-ID,  then  separating 
the  ponions  of  the  program  trace  which  arise  from  different  invocations.  In  the  above 
example,  the  initial  program  trace  would  be 

(1.1,1.2,1.3,2.1,2.2,2.5,2.6,2.7,2.0,1.4,1.6,1.7,1.0),  where  each  element  of  the  sequence 
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X  >=  0 


(j\    read(y) 


x<0 


Q(x-l)    (3) 


writc(y)    '4 
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Flow  graph  for  procedure  Q 


FIGURE  3.2.3 
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has  the  form  invocation-ID. node-number,  and  where  node  number  0  indicates  termina- 
tion of  the  corresponding  invocation.  This  trace  could  then  easily  be  divided  into  the 
paths  (1,2,3,4,6,7)  and  (1,2,5,6,7),  as  desired.  It  is  not  clear  whether  the  benefits  of  such 
a  design  justify  the  considerable  overhead  involved. 

3.  The  Implementation  of  ASSET 

In  this  section  we  describe  the  implementation  of  the  current  version  of  ASSET, 
which  is  an  experimental  prototype. 

ASSET  runs  BSD  4.2  UNIX  systems  and  has  been  implemented  on  DEC  VAX's 
and  on  Sun  Workstations*.  The  front  end,  the  association-finder,  and  the  checker  are 
written  in  Berkeley  Pascal.  Each  of  these  is  an  individual  Pascal  program.  The  user 
interface  is  written  in  C  and  makes  system  calls  to  invoke  the  modules,  to  manipulate 
files,  and  to  invoke  various  additional  functions  which  were  not  described  in  the  above 
design  description.  Module  C,  the  compiler,  is  the  Unix  system's  Pascal  compiler,  pc. 
Module  D,  the  checker  is  implemented  as  part  of  the  user  interface. 

The  user  interface  is  user-friendly  and  is  driven  by  commands  issued  by  the  user.  In 
a  typical  ASSET  session,  the  user  will  initially  select  a  criterion  and  direct  ASSET  to 
invoke  the  front  end,  the  association-finder,  and  the  compiler.  At  this  point  the  checker 
can  be  invoked  to  produce  a  list  of  all  of  the  definition-use  associations.  The  user  then 
invokes  module  D,  the  tester,  to  run  the  modified  subject  program  on  one  or  more  test 
cases,  then  the  checker  to  determine  which  (if  any)  associations  required  by  the  criterion 
have  not  been  executed  by  the  test  data.   At  this  point,  the  user  can  add  additional  test 


•  UNIX  is  a  trademark  of  AT&T  Bell  Laboratories.  DEC  and  VAX  are  trademarks  of  Digital 
Equipment  Corporation.  Sun  Workstation  is  a  trademark  of  Sun  Microsystems,  Inc. 
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cases  to  the  test  set  and  run  the  program  again  or  can  select  a  new  criterion  and  check  the 
current  test  set  against  the  new  criterion.  A  more  complete  description  of  ASSET  from 
the  user's  viewpoint  can  be  found  in  the  ASSET  User  Manual,  included  in  this  thesis  as 
Appendix  I. 

Module  A,  the  front  end,  is  implemented  as  a  one-pass  recursive-descent  parser 
which  builds  the  symbol  table,  builds  the  flow  graph  for  Q;  classifies  variable 
occurrences  as  being  definitions,  c-uses,  or  p-uses,  storing  this  information  in  a  table 
called  the  def-use  table;  and  creates  a  preliminary  version  of  the  modified  subject  pro- 
gram in  which  the  probes  have  been  inserted.  Declarations  are  added  in  one  pass  over 
the  modified  subject  program.  The  def-use  table,  an  adjacency  list  representation  of  the 
flow  graph,  and  relevant  parts  of  the  symbol  table  are  written  to  files.  The  front  end  does 
not  completely  parse  expressions,  nor  does  it  explicitly  build  the  parse  tree. 

In  Module  B,  the  association-finder,  a  series  of  depth-first-searches  is  performed 
(unless  the  criterion  is  all-du-paths.)  For  each  definition  of  a  variable  v  having  a 
definition  in  node  i,  a  depth  first  search  is  performed  beginning  at  node  i.  Each  time  a 
node  j  is  visited,  ASSET  checks  the  def-use  table  to  see  whether  variable  v  occurs  in 
node  j.  If  node  j  has  a  c-use  of  v  then  the  def-c-use  association  (ij,v)  is  recorded.  If 
node  j  is  the  source  of  edges  (j,)t  i ) (],k„)  having  p-uses  of  v  then  the  def-p-use  associa- 
tions (i.(j^  1  ).v) (i,(j.^m).v)  are  recorded.  If  node  j  has  a  definition  of  v  then  the  search 

backtracks. 

At  most  NV  depth  first  searches  are  performed  where  N  is  the  number  of  nodes  in 
the  flow  graph  and  V  is  the  number  of  variables.  Since  the  graphs  involved  are  sparse, 
each  depth  first  search  takes  0(N)  time.  Thus  the  complexity  of  the  association-finder  is 
OCN^V). 
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This  implementation  of  the  association-finder  is  a  hold-over  from  an  earlier  version 
of  ASSET  (which  took  as  input  programs  written  in  a  simple  language  having  only  sim- 
ple variables)  in  which  some  data  flow  anomaly  detection  was  performed.  More 
efficiency  could  be  achieved  in  practice  by  using  one  of  the  algorithms  mentioned  in  sec- 
tion 2. 

If  the  criterion  is  all-defs,  all-c-uses,  all-p-uses,  all-c-uses/some-p-uses,  all-p- 
uses/some-c-uses,  or  all-uses,  then  module  E,  the  checker,  uses  the  algorithm  in  Figure 
3.2.2.  If  the  criterion  is  all-du-paths  then  the  checker  uses  the  Knuth-Morris-Pratt  pattern 
matching  algorithm.  The  du-paths  are  stored  internally  as  an  array  of  arrays  of  characters 
and  one  pass  is  made  over  the  program  trace  file. 

We  conclude  this  section  by  mentioning  some  of  the  limitations  of  the  current 
implementation  of  ASSET.  The  fact  that  ASSET  stores  various  tables  internally  as 
arrays  imposes  certain  restrictions  on  the  number  of  nodes  in  the  flow  graph,  the  number 
of  def-use  associations,  etc.  ASSET  also  places  certain  minor  restrictions  on  identifiers 
which  can  be  used  in  the  subject  program,  in  order  to  avoid  interference  with  the  names 
of  files  created  by  ASSET  and  with  identifiers  added  to  the  modified  subject  program. 
Details  of  these  restrictions  can  be  found  in  the  ASSET  User  Manual.  In  addition,  the 
User  Manual  documents  certain  bugs  in  the  current  implementation  which  will  be 
corrected  in  the  next  version. 
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CHAPTER4: 

The  Feasible  Data  Flow  Testing  Criteria 

1.  Definitions  of  the  feasible  data  flow  testing  criteria 

Given  a  subprogram  P  and  a  DF  criterion  C  it  may  be  the  case  that  no  test  data  for  P 
exists  which  satisfies  C.  This  occurs  when  none  of  the  paths  which  cover  a  particular 
association  required  by  C  are  executable.  In  such  a  case,  P  cannot  be  adequately  tested 
according  to  C.  In  this  section  we  introduce  a  new  family  of  criteria  which  are  derived 
from  the  DF  criteria  and  which  circumvent  this  problem  and  investigate  some  of  its  pro- 
penies.  We  assume  that  all  side  effects  and  aliasing  are  known. 

Recall  that  a  complete  path  is  a  path  from  the  entry  node  to  the  exit  node  of  a 
subprogram's  flow  graph.  We  will  say  that  a  complete  path  is  executable  or  feasible  if 
there  exists  some  assignment  of  values  to  input  variables,  non-local  variables,  and  param- 
eters which  causes  the  path  to  be  executed.  We  will  say  that  a  path  is  executable  if  it  is  a 
subpath  of  an  executable  complete  path.  According  to  this  definition,  the  question  of 
whether  or  not  a  given  path  through  a  subprogram  is  executable  is  independent  of  the 
context  in  which  that  subprogram  is  called.  However,  it  may  depend  on  the  effects  of 
any  procedures  or  functions  which  are  called  along  the  path.  In  section  5  we  will  dis- 
cuss the  consequences  of  modifying  this  notion  of  cxecutability  to  take  into  account 
information  about  the  context  in  which  the  subprogram  is  called.  Note  that  whether  or 
not  a  panicular  path  is  executable  depends  on  the  actual  subprogram,  not  just  on  its  def- 
use graph. 

We  will  say  that  an  association  is  executable  if  there  is  some  executable  complete 
path  which  covers  it;  otherwise  it  is  unexecutable    We  define  subsets  fdcu(x,i)  cdcu(x.i) 
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and  fdpu(x,i)  c  dpu(x,i)  whose  elements  correspond  to  those  associations  which  are  exe- 
cutable as  follows:  dcu(x,i)=(no(ies  j  such  that  xec-use(j)  and  there  is  an  executable 
definition  clear  path  with  respect  to  x  from  i  to  j}.  dpu(x,i)={tdgts  (j,k)  such  that  xep- 
use(j,k)  and  there  is  an  executable  definition  clear  path  with  respect  to  x  from  i  to  (jM)]- 
Equivalently,  fdcu(x,i)={jedcu(x,i)lthe  association  (ij,x)  is  executable}  and 
fdpu(x,i)={(jjc)edpu(x,i)lthe  association  (i,(jJc),x)  is  executable}.  For  each  DF  criterion 
C  we  define  a  new  criterion  C*  by  selecting  the  required  associations  from  fdcu(x,i)  and 
fdpu(x,i)  instead  of  from  dcu(x,i)  and  dpu(x,i).  Precise  definitions  of  these  criteria  are 
given  in  Figure  4.1.1.  We  will  refer  to  the  criteria  {(all-du-paths)*,  (all-uses)*,  (all-p- 
uses/some-c-uses)*,  (all-c-uses/some-p-uses)*,  (all-p-uses)*)  &s feasible  dataflow  testing 
criteria,  or  PDF  criteria,  for  short. 

The  FDF  criteria  satisfy  the  applicability  property:  For  any  subprogram  P  and  any 
FDF  criterion  C*,  there  is  some  test  set  T  which  satisfies  C*.  However,  the  question  of 
whether  a  particular  T  satisfies  C*  for  subprogram  P  is  undecidable.  In  going  from  the 
family  DF  to  the  family  FDF,  we  have  traded  the  undecidability  of  the  existence  ques- 
tion, "is  there  any  test  which  is  C-adequate  for  P?"  for  the  undecidability  of  the  recogni- 
tion problem  "is  a  given  test  C*-adequate  for  P?" 

Observe  that  for  any  DF  criterion  C,  C  =>  C*.  We  now  investigate  the  inclusion 
relations  among  the  FDF  criteria. 

THEOREM  4.1.1:  The  family  of  FDF  criteria  is  partially  ordered  by  strict  inclusion  as 
shown  in  Figure  4.1.2.  Funhermore,  FDF  criterion  C,*  includes  FDF  criterion  Cj*  if 
and  only  if  it  is  explicitly  shown  to  do  so  in  the  figure  or  it  follows  from  the  transitivity  of 
the  relations. 


THE  FEASIBLE  DATA  FLOW  TESTING  CRITERIA 

fdcu(x,i)  =  {J6dcu(x,i)l  the  association  (i,j,x)  is  executable) 
fdpu(x,i)  =  {(j,k)6dpu(x,i)l  the  association  (i,(j,k),x)  is  executable) 


A  test-set/driver-program  pair  (T,D)  satisfies  criterion  C  for  subprogram  P  if  and  only  if 
for  each  node  i  in  P's  flow  graph  and  each  x  e  def(i)  the  set  n  of  paths  executed  by  T 
covers  the  following  associations: 


CRITERION 

REQUIRED  ASSOCIATIONS 

(all-defsj* 

if  fdcu(x,i)  u  fdpu(x,i)  ^  0  then 
some  (i,j,x)  s.t  JGfdcu(x,i)  or 
some            (i,(j,k),x)            s.t. 
(i,k)efdpu(x,i). 

(all-c-uses)* 

all  (ij,x)  s.t.  jefdcufx.i). 

(all-p-uses)* 

all  (i.(j,k),x)  s.t.  (j,k)G  fdpu(x,i). 

(all-p-uses/some-c-uses)* 

all  (i,(j,k),x)  s.t.  (j,k)G  fdpu(x,i). 
In  addition,  if  fdpu(x,i)  =  0  and 
fdcu(x,i)  ?t  <)  then  some  (ij,x) 
s.t.  je  fdcu(x,i). 

(aJl-c-uses/some-p-uses)* 

all    (i,j,x)    s.t.   JGfdcu(x,i).     In 
addition,  if  fdcu(x,i)  =  0  and 
fdpu{x,i)     ^     0     then     some 
(i,(i,k),x)  s.t.  (j,k)efdpufx,i). 

(aJl-uses)* 

all  (ij.x)  s.t.  j  e  fdcu(x,i)  and 
all     (i,(j,k),x)     s.t.     (j.k)     e 
fdpu(x,i). 

(all-du-paths)* 

all    executable    du-paths    with 
respect   to   x   from   i   to  j   s.t. 
jedcu(x,i)   and    all   executable 
du-paths  with  respect  to  x  from 
i    to    (j.k)    for    each    (j.k)    e 
dpu(x,i). 

For  comparison  we  also  define  the  criteria  (all-nodes)*  [(all-edges)*,  (all-paths)*,  respec- 
tively 1  which  require  that  Fl  cover  each  executable  node  [each  executable  edge,  each  exe- 
cutable path,  respectively.) 


Figure  4. LI 
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(ALL-PATHS)* 


(ALL-DU-PATHS)* 
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(ALL-EDGES)" 
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/^ 


(ALL-USESj*     (ALL-NODES)* 


-.x 


(ALL-C-USES,SOME-F-USES)* 


(ALL-P-USES/SOME-C-USES)" 


(ALL-C-USES)' 


(ALL-DEFS)* 


(ALL-P-USES)* 


FIGURE  4.1.2 
The  relationship  among  the  feasible  data  flow  testing  criteria 
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PROOF: 

A.  Strictness  of  the  inclusions 

We  first  observe  that  if  subprogram  P  has  no  unexecutable  paths,  then  a  test  is  C- 
adequate  for  P  if  and  only  if  it  is  C*-adequate  for  P.  This  observation,  along  uath  the 
proofs  of  strictness  of  the  inclusions  in  Theorem  1  of  [RAP85],  none  of  which  involve 
subprograms  with  unexecutable  paths,  shows  that  all  of  the  inclusions  in  Figure  4.1.2  are 
strict.  It  thus  suffices  to  show  that  the  inclusions  in  Figure  4.1.2  hold. 

B.l.  (all-paths)*  =>  (all-uses)*  : 

Suppose  not.  Then  there  is  a  subprogram  P  and  a  set  T  of  test  data  which  is  (all- 
paths)*-adequate  for  P  but  not  (all-uses)*-adequate.  Let  n  be  the  set  of  paths  through  P 
which  T  executes.  There  exist  a  node  i  in  P  with  a  global  definition  of  some  variable  x,  a 
node  j  with  a  global  c-use  of  x  or  edge  (jM)  with  p-use  of  x,  and  an  executable  definition 
clear  path  with  respect  to  x  from  i  to  j  (respectively  from  i  to  (j,k)]  which  is  not  covered 
by  n.  This  contradicts  the  fact  that  n  covers  every  executable  path. 

The  proofs  that  (all-paths)*  =*  (all-du-paths)*  and  (all-paths)*  =>  (all-edges)*   are 
similar. 
B.2.  (all-paths)*  =>  (all-du-paths)*  : 

Suppose  not.  Then  there  is  a  subprogram  P  and  a  set  T  of  test  data  which  is  (all- 
paths)*-adequate  for  P  but  not  (all-du-paths)*-adequatc.  Let  n  be  the  set  of  paths 
through  P  which  T  executes.  There  exists  an  executable  du-path  which  is  not  covered  by 
n    This  contradicts  the  fact  that  fl  covers  every  executable  path. 

B.3.  (all-paths)*  =>  (all-edges)*  : 
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Suppose  not.  Then  there  is  a  subprogram  P  and  a  set  T  of  test  data  which  is  (all- 
paths)*-adequate  for  P  but  not  (all-edges)*-adequate.  Let  n  be  the  set  of  paths  through  P 
which  T  executes.  There  exists  an  executable  edge  (i,j)  which  is  not  covered  by  n.  This 
contradicts  the  fact  that  n  covers  every  executable  path. 

B.4.  (all-edges)*  =>  (all-nodes)*  : 

Let  T  be  a  test  which  satisfies  (all-edges)*  for  subprogram  P,  and  let  n  be  the  set  of 
paths  executed  by  T.  Let  n  be  any  executable  node  in  P.  If  n  is  the  entry  node,  then  n 
has  a  unique  successor,  m,  and  (n,m)  is  executable.  So  n  covers  (n,m)  and  hence  covers 
n.  If  n  is  not  the  entry  node,  then  since  n  is  executable,  some  branch  (i,n)  is  executable. 
So  n  covers  (i,n)  and  hence  covers  n. 

B.5.  (all-uses)*  =>  (all-p-uses/some-c-uses)*,  (all-p-uses/some-c-uses)*  =*  (all-p-uses)*, 
(all-p-uses/some-c-uses)*  =>  (all-defs)*,  (all-uses)*  =>  (all-c-uses/some-p-uses)*,  (all-c- 
uses/some-p-uses)*  =>  (all-defs)*  : 

These  inclusions  follow  immediately  from  the  definitions  of  the  criteria  given  in 
Figure  4.1.1.  For  example,  any  set  FT  of  paths  which  covers  all  of  the  associations 
required  by  (all-uses)*  will  a  fortiori  cover  all  of  the  associations  required  by  (all-p- 
uses/some-c-uses)*. 

We  next  show  that  those  relations  not  in  the  transitive  closiu-e  of  the  diagram  in  Fig- 
ure 4.1.2  do  not  hold. 

C.l.  (all-du-paths)*  (all-p-uses)*;  (all-du-paths)*  (all-p-uses/some-c-uses)*;  (all-du- 
paths)*  (all-uses)*;  (all-du-paths)*  (all-c-uses/some-p-uses)*;  (all-du-paths)*  (all- 
defs)*;  (all-du-paths)*  (all-edges)*;  (all-du-paths)*  (all-nodes)*  : 

It  suffices  to  show  that  (all-du-paths)*  (all-p-uses)*,  (all-du-paths)*  (all-defs)*,  and 
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(all-du-paths)*  (all-nodes)*.  The  rest  follows  from  the  transitivity  of  =>.  Consider  the 
subprogram  shown  in  Figure  4.1.3a.  Its  du-paths  are  shown  in  Figure  4.1.3b.  Of  these, 
only  (1,2),  (2,3,4),  (4,3,4),  and  (4,3,5)  are  executable.  Let  T  =  {(X,Y))  where  X  is  any 
integer  and  Y  <  0.  Since  T  executes  n  =  {(1,2,3,4,3,4,3,5,6,7,9,10)),  T  satisfies  (all-du- 
paths)*.  However,  n  does  not  cover  the  associations  (2,(6,8),y),  (2,8,x),  or  node  8,  all  of 
which  are  covered  by  the  executable  path  (1,2,3,4,3,4,3,5,6,8,9,10),  so  T  does  not  satisfy 
(all-p-uses)*,  (all-defs)*,  or  (all-nodes)*. 

Intuitively,  (all-du-paths)*  fails  to  include  these  criteria  because  it  is  possible  for  a 
subprogram  to  have  cenain  definition-use  associations  which  can  be  executed  only  by 
paths  which  traverse  some  loop  one  or  more  times. 

C.2.  (all-p-uses)*  (all-edges)*;  (all-p-uses/some-c-uses)*  (all-edges)*;  (all-uses)*  (all- 
edges)*;  (all-p-uses)*  (all-nodes)*;  (all-p-uses/some-c-uses)*  (all-nodes)*;  (all-uses)* 
(all-nodes)*  : 

Consider  the  subprogram  in  Figure  4.1.4,  where  y  is  a  local  variable  (and  hence 
does  not  have  a  definition  in  the  entry  node).  Notice  that  since  node  3  is  unexecutable,  y 
is  always  uninitialized  at  node  5.  In  the  absence  of  any  information  about  which  (if 
either)  edge  leaving  node  5  will  be  executed  when  the  program  is  run  on  actual  test  data, 
we  make  the  worst-case  assumption  that  edges  (5,6)  and  (5,7)  are  both  executable.  This 
would  be  the  case,  for  example,  in  an  environment  in  which  uninitialized  variables 
receive  arbitrary  values.  Since  node  3  is  unexecutable.  the  only  executable  definition-use 
associations  are  ( 1 ,2.input).  (2,  (2.4),  x)  and  (2.9,inputT).  Let  T  be  a  test  which  executes 
n=( (1.2,4.5.6.8.9))  or  n={(1.2,4.5.7.8.9)).  Then  T  satisfies  (all-p-uscs)*.  (all-p- 
uscs/some-c-uses)*.  and  (all-uses)*,  but  docs  not  satisfy  (all-edges)*  or  (all-nodes)*. 
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Notice  that  the  subprogram  in  Figure  4.1.4  has  a  data  flow  anomaly  [OST76].  We 
shall  see  below  that  this  is  not  a  mere  coincidence,  but  that  rather,  it  is  this  particular  kind 
of  anomaly  which  prevents  the  inclusions  from  holding. 

The  rest  of  the  non-inclusions  follow  from  the  incomparability  and  strictness  proofs 
in  [RAP85].  ■ 

It  seems  discouraging  that  (all-p-uses)*  fails  to  include  (all-edges)*.  Data  flow  test- 
ing was  developed  in  order  to  "bridge  the  gap"  between  branch  testing  and  path  testing. 
Since  many  "real-life"  subprograms  cannot  be  adequately  tested  using  the  unstarred  ver- 
sions of  the  data  flow  criteria,  one  would  hope  that  the  FDF  criteria  would  "bridge  the 
gap"  between  (all-edges)*  and  (all-paths)*.  We  have  seen  that  this  is  not  the  case.  We 
next  show  that  for  a  cenain  class  of  "well-behaved"  subprograms,  any  test  which  satisfies 
(all-p-uses)*  satisfies  (all-edges)*. 

DEFINITION:  We  will  say  that  a  subprogram  P  satisfies  the  No-Feasible-Undefined-P- 
uses property  (NFUP)  if  and  only  if  for  every  executable  edge  (ij)  in  P  having  a  p-use  of 
a  variable  x  there  is  some  executable  path  from  the  start  node  to  edge  (iJ)  which  contains 
a  global  definition  of  x. 

We  note  that  it  is  quite  reasonable  to  expect  subprograms  to  have  propeny  NFUP. 
If  (i.j)  is  an  edge  which  causes  NTUP  to  fail,  then  any  input  which  causes  (iJ)  to  be  exe- 
cuted will  involve  referencing  an  uninitialized  variable. 

THEORFM  4.1.2  For  the  class  of  subprograms  which  satisfy  NTUP,  (all-p-uses)*  =* 
(all-edges)*. 

PROOF:  Let  P  be  a  subprogram  satisfying  NTUP,  let  T  be  a  test  which  satisfies  (all-p- 
uses)* for  P.  let  II  be  the  set  of  paths  executed  by  T.  and  let  (ij)  be  an  executable  edge  in 
P.   Suppose  (i  j)  has  a  p-use  of  a  variable  x.   By  h\-pothesis  there  is  an  executable  path  n 
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from  the  start  node  to  (i,j)  which  includes  a  global  definition  of  x.  Let  n  be  the  last  node 
in  7t  having  a  global  definition  of  x.  Then  (n,  (i  j),  x)  is  an  executable  defininon-p-use 
association  so  it  is  covered  by  n.  Hence  (ij)  is  covered  by  Fl. 

If  (iJ)  has  no  p-uses,  then  the  result  follows  by  a  straight-forward  modification  of 
the  corresponding  part  of  the  proof  of  (all-p-uses)  =*  (all-edges)  in  Chapter  2,  section  1  .■ 

In  [RAP85]  the  class  of  subprograms  to  which  data-flow  testing  applies  is  restricted 
to  those  subprograms  satisfying  the  No-Syntactic-Undefined-P-use  Property,  defined  in 
section  2  above.  This  restriction  was  necessary  in  order  to  insure  that  all-p-uses  =>  all- 
edges.  NFUP  is  a  strengthening  of  NSUP.  It  takes  into  account  the  fact  that  in  subpro- 
grams satisfying  NSUP,  the  only  paths  7C  from  the  entry  node  to  some  p-use  of  x  such  that 
x  has  a  global  definition  in  some  node  in  7i  might  be  unexecutable. 

It  is  tempting  to  restrict  the  class  of  programs  to  which  the  FDF  criteria  apply  to 
those  satisfying  NFUP.  It  is  our  feeling  however  that  while  one  can  live  with  the  undeci- 
dability  of  the  adequate  test  recognition  problem  and  perhaps  (albeit  very  uncomfortably) 
with  the  undecidability  of  the  adequate  test  existence  problem,  one  should  at  least  be  able 
to  decide  algorithmically  whether  a  given  testing  strategy  applies  to  a  given  subprogram. 
Since  it  is  undecidable  whether  a  given  subprogram  satisfies  NFUP,  we  refirain  from 
requiring  this  property  of  subprograms  to  be  tested. 

Another  possible  way  to  force  (all-p-uses)*  to  include  (all-edges)*  would  be  to 
require  subprograms  to  satisfy  the  No-Anomalies  Propeny  (NA): 

Every  path  from  the  start  node  to  a  use  of  a  variable  x  must  contain  a  definition  of  x. 

Osterweil  and  Fosdick  [OST76]  consider  any  subprogram  not  satisfying  this  property  to 
have  data-flow  anomaly  indicative  of  possible  subprogram  error.   Since  NA  is  a  purely 
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syntactic  property  and  NA  implies  NFUP  we  could  restrict  FDF  testing  to  subprograms 
satisfying  this  property.  We  feel  that  this  is  overly  restrictive,  since  many  perfectly  good 
subprograms  fail  to  satisfy  NA. 

One  way  to  force  NA  to  be  satisfied  is  to  give  the  entry  node  a  definition  of  each 
variable.  This  would  potentially  increase  the  number  of  def-use  associations  and  thus 
make  the  criteria  more  demanding.  However,  it  would  make  the  model  of  the 
subprogram's  data  flow  reflect  the  actual  data  flow  less  accurately. 

Another  approach  is  to  perform  FDF  testing  in  conjunction  with  a  check  for  data- 
flow anomalies.  For  any  subprogram  which  satisfies  NA  and  any  test  set  T  which 
satisfies  (all-p-uses)*  the  tester  will  be  assured  that  T  satisfies  (all-edges)*  .  In  case  NA 
does  not  hold,  the  tester  should  explicitly  check  whether  (all-edges)*  is  satisfied  and  if 
necessary  add  more  test  data  or  inspect  the  subprogram  for  references  of  uninitialized 
variables. 

2.  Axiomatic  Evaluation  of  the  Feasible  Data  Flow  Testing  Criteria 

In  this  section  we  evaluate  which  of  the  axioms  for  a  good  adequacy  criterion  arc 
satisfied  by  the  feasible  data  flow  testing  criteria. 

Several  of  the  axioms  are  of  the  form  "There  exists  a  program  P  and  a  test  set  T 
such  that  some  property  holds".  If  a  data  flow  testing  criterion  C  satisfies  one  of  these 
axioms,  then  the  corresponding  feasible  data  flow  testing  criterion  C*  automatically 
satisfies  the  axiom.  In  these  cases,  the  examples  used  in  Chapter  2.2  to  prove  that  C 
satisfies  an  axiom  also  show  thai  C*  satisfies  the  axiom. 
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Axiom  1:  Applicability 

For  every  program,  there  exists  an  adequate  test  set. 

The  feasible  data  flow  testing  criteria  have  all  been  defined  in  such  a  way  as  to 
guarantee  that  the  applicability  criterion  is  satisfied.  For  each  of  these  criteria,  the 
definition-use  associations  which  must  be  covered  by  the  test  set  are  all  executable.  By 
definition,  this  guarantees  the  existence  of  adequate  test  data. 

Axiom  2:  Non-Exhaustive  Applicability 

There  is  a  program  P  and  test  set  T  such  that  P  is  adequately  tested  by  T  and  T  is  not  an 
exhaustive  test  set. 

All  of  the  feasible  data  flow  testing  criteria  satisfy  this  axiom.  This  follows  firom 
the  fact  that  all  of  the  data  flow  testing  criteria  satisfy  the  axiom. 

Axiom  3:  Monotonicity 

If  T  is  adequate  for  P  and  TcT"'  then  T'  is  adequate  for  P. 

Since  all  of  the  executable  associations  covered  by  T  are  also  covered  by  T',  all  of 
the  data  flow  criteria  satisfy  the  monotonicity  property. 

Axiom  4:  Inadequate  Empty  Set 

The  empty  set  is  not  adequate  for  any  program. 

Every  subprogram  has  at  least  one  executable  def-p-use  association.  To  see  this,  let 
n  be  the  unique  node  such  that  n  has  at  least  two  successors  and  no  node  on  any  path 
from  the  entry  node  to  node  n  has  more  than  one  successor.  At  least  one  of  the  edges 
(n,m)  whose  source  is  node  n  must  be  executable.  Since  node  n  has  more  than  one  sue- 
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cessor,  edge  (n,m)  has  a  p-use  of  some  variable,  v.  Since  the  subprogram  is  assumed  to 
satisfy  the  No-Syntactic-Undefined-p-uses  property,  there  is  a  global  definition  of  v  on 
the  path  from  the  entry  node  to  (n,m).  Therefore,  there  is  an  executable  definition-p-use 
association.  Note  that  the  path  from  the  entry  node  to  edge  (n,m)  is  a  du-path. 

It  follows  that  the  criteria  (all-defs)*,  (all-p-uses)*,  (all-p-uses/some-c-uses)*,  (all- 
c-uses/some-p-uses)*,  (all-uses)*,  and  (all-du-paths)*  satisfy  the  inadequate  empty  set 
propeny. 

The  criterion  (all-c-uses)*  fails  to  satisfy  the  inadequate  empty  set  axiom.  This  is 

because  it  is  possible  for  a  program  to  have  no  executable  definition-c-use  associations. 

Consider,  for  example,  the  program 

program  example(input,output); 

var  X  :  integer; 

begin 

read(x); 

while  x=x  do  writeln('hello') 
end. 

The  only  definition-c-use  association  in  this  program  arises  from  the  fact  that  there  is  a 
definition  of  input}  in  the  entry  node  and  a  c-use  of  input'l  in  the  exit  node.  But  this 
association  is  unexecutable  because  control  can  never  reach  the  exit  node. 

Axiom  5:  Anti-extensionality 

There  arc  programs  P  and  Q  such  that  P  is  equivalent  to  Q,  T  is  adequate  for  P,  but  T  is 
not  adequate  for  Q. 

All  of  the  feasible  data  flow  testing  criteria  satisfy  the  Anti-extensionality  axiom.. 
This  follows  from  the  fact  that  all  of  the  DF  criteria  satisfy  the  axiom. 
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Axiom  6:  General  Multiple  Change 

There  are  programs  P  and  Q  which  are  the  "same  shape",  and  a  test  set  T  such  that  T  is 
adequate  for  P,  but  T  is  not  adequate  for  Q. 

All  of  the  feasible  data  flow  testing  criteria  satisfy  the  general  multiple  change 
axiom.  This  follows  from  the  fact  that  all  of  the  DF  criteria  satisfy  the  axiom. 

Axiom  7:  Antidecomposition 

There  is  a  subprogram  P  which  calls  a  subprogram  Q  and  a  test  T  such  that  T  is  adequate 
for  the  program  P'  obtained  from  P  by  expanding  in-line  one  call  to  Q,  but  the  test- 
set/driver-program  pair  (T,P)  is  not  adequate  for  Q. 

All  of  the  feasible  data  flow  testing  criteria  satisfy  the  antidecomposition  axiom. 
Recall  that  most  of  the  DF  criteria  failed  to  satisfy  this  axiom  because  (with  certain 
exceptions,  which  are  artifacts  our  model  of  data  flow)  each  definition-use  association  a 
in  Q  corresponds  to  a  definition-use  association  a'  in  the  portion  of/"  which  corresponds 
to  Q.  However,  it  is  possible  for  a  to  be  executable,  while  a'  is  unexecutable.  An 
extreme  case  of  this  occurs  when  every  call  to  Q  from  P  is  unexecutable. 

Consider  the  following  programs,  whose  flow  graphs  are  shown  in  Figure  4.2.1. 


program  P(input,ourput); 
van  X  :  integer; 
procedure  Q; 

var  y:  integer, 
begin  {Q} 

read(y); 

if  y>=0  then  writeln('hello') 
else  writeln('goodbye') 
writeln(y); 
end;  (Q) 
begin 

read(x); 

if  x>=Othen 
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ifx<Othen  Q 


end. 


program  /"(inpui.output); 
var  x,y  :  integer, 
begin 

read(x); 

if  X  >=0  then 

if  X  <0  then 

begin  (procedure  Q  expanded  in-line} 
read(y); 

if  y>=Othen  writeln('hello') 
else  vvTiteln('gCKxibye') 
writeln(y); 
end;  (end  of  in-line  expansion  of  Q} 
end. 

Note  that  the  call  to  Q  in  P  is  unexecutable.  The  only  executable  paths  from  the 
entry  to  the  exit  of  P'  are  (16,17.18,26,27)  and  (16,17,19,20,25).  Since  the  test  set 
T={(-1),(0))  executes  both  of  these  paths,  it  covers  all  of  the  executable  definition-use 
associations  in  P',  and  hence  satisfies  all  of  the  FDF  criteria.  When  the  elements  of  T  are 
input  to  program  P,  however,  they  do  not  cover  any  of  the  associations  (ll,14,y), 
(11,(1 1,12), y),  or  (1 1,(1 1,1 3),y)  in  Q.  Note  that  these  associations  are  executable,  since 
ihey  can  be  covered,  for  example  by  the  driver-program/test-set  pair  (/"'.T)  where  P" 
just  calls  Q.  So  T  does  not  satisfy  all-defs  or  all-p-uses  for  Q,  and  hence  does  not  satisfy 
any  of  the  criteria.  This  shows  that  all  of  the  FDF  criteria  satisfy  the  Antidecomposition 
axiom. 

Recall  that  the  DF  criteria  all-defs,  all-c-uscs,  and  all-c-uses/some-p-uses  satisfied 
this  axiom  as  an  only  artifact  of  the  particular  model  of  data  flow  employed.  In  the  case 
of  the  FDF  criteria,  on  the  other  hand,  the  criteria  satisfy  the  axioms  because  of  the  fun- 
damental nature  of  the  criteria. 
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Axiom  8:  Anticomposition 

There  is  a  program  R  whose  statement  part  consists  of  a  call  to  procedure  P  followed 
immediately  by  a  call  to  procedure  Q,  and  a  test  set  T  for  R  such  that  is  adequate  for  P 
and  for  Q  (as  procedure's  called  by  R)  but  T  is  not  adequate  for  the  program  R'  obtained 
from  R  by  expanding  P  and  Q  in-line. 

All  of  the  feasible  data  flow  testing  criteria  satisfy  this  axiom.  This  follows  from 
the  fact  that  the  DF  criteria  satisfy  the  axiom. 

Combining  the  results  in  this  section  gives  us 

THEOREM  4.2.1: 

The  criteria  (all-defs)*,  (all-p-uses)*,  (all-p-uses/some-c-uses)*,  (all-c-uses/some-p- 
uses)*,  (all-uses)*,  and  (all-du-paths)*  all  satisfy  the  applicability,  non-exhaustive  appli- 
cability, monotonicity,  inadequate  empty  set,  anti-extensionality,  general  multiple 
change,  anti-decomposition,  and  anti-composition  axioms.  The  (all-c-uses)*  criterion 
satisfies  all  of  the  axioms  except  for  the  inadequate  empty  set  axiom. ■ 

3.  A  generalized  notion  of  executability 

The  definition  of  executability  given  in  Section  3  fails  to  take  into  account  any 
information  about  the  context  in  which  a  subprogram  is  called.  It  may  be  the  case  that 
there  is  no  input  data  to  the  program  as  a  whole  which  cause  the  execution  of  a  particular 
executable  path  through  a  subprogram.  In  order  to  adequately  test  such  a  subprogram 
with  respect  to  a  given  FDF  criterion  it  may  be  necessary  to  write  a  driver  program  which 
assigns  particular  values  to  global  variables  and  parameters,  and  then  calls  the  subpro- 
gram. Whether  this  extra  effort  is  "wonhwhilc"  depends  on  whether  it  is  likely  that  the 
subprogram  will  ever  bo  called  in  a  context  other  than  the  one  in  uhich  it  currently 
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appears  in  the  program.  In  this  section  we  define  a  more  general  notion  of  executability 
which  takes  into  account  information  about  the  context  in  which  a  subprogram  is  called 
and  explore  the  effects  of  this  generalization  on  the  FDF  criteria. 

Consider  the  program: 


program  main(input,output); 
type  CharString  =  arraj'[1..10]  of  char, 
var  string  1  :  CharString; 
length  :  integer. 


procedure  WriteString(str:  CharString;  n  :  integer); 
{Writes  the  first  n  characters  of  str  to  standard  output.} 
var  i :  integer; 
begin 

for  i  :=  1  to  n  do  write(str[i]) 
end; 

begin  {statement  part  of  main  program} 


if  length  >  0  then  WriteString(stringl, length) 
else  ... 


end.  {main  } 

Suppose  that  at  every  point  in  the  program  at  which  WriteString  is  called,  the  value  of  n 
is  guaranteed  to  be  strictly  greater  than  zero.  Then  no  input  to  the  program  can  cause  the 
execution  of  the  path  through  the  procedure  which  traverses  the  loop  zero  times. 

In  order  to  adequately  test  WriteString  with  respect  to  the  criterion  (all-uses)*  it  is 
necessary  to  include  test  data  which  causes  the  for  loop  to  be  traversed  zero  times.  To  do 
this,  one  must  write  a  driver  program  which  calls  WriteString  with  the  second  parameter 
having  a  value  less  than  or  equal  to  zero.  If  we  think  that  we  might  actually  want  to  use 
the  procedure  WriteString  in  a  less  restricted  context  (for  example  due  to  modifications 
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of  the  calling  program  or  to  re-using  the  procedure  in  a  different  program)  then  this  is  a 
reasonable  thing  to  do.  On  the  other  hand,  if  we  are  fairly  certain  that  the  procedure  will 
never  be  called  in  a  context  where  n  is  less  than  or  equal  to  zero,  then  writing  a  driver 
program  could  be  construed  as  being  a  wasted  effort.  What  is  needed  is  a  notion  of  test 
data  adequacy  which  takes  into  account  information  about  the  context  in  which  the  sub- 
program being  tested  can  be  called. 

We  can  achieve  this  by  relativizing  the  definition  of  executability  as  follows.  We 
associate  with  the  subprogram  to  be  tested,  a  predicate  IC{V i,...,Vi^),  called  the  input 
constraint  where  V\,...,Vi,  represent  the  subprogram's  parameters  and  non-local  vari- 
ables. A  path  through  the  subprogram  is  then  executable  relative  to  IC  if  there  exists 
some  assignment  of  values  to  input  variables,  parameters,  and  non-local  variables  which 
satisfies  IC  and  which  causes  the  path  to  be  executed.  A  path  is  executable  as  defined  in 
Section  4. 1  if  and  only  if  it  is  executable  relative  to  the  input  constraint  IC^TRUE. 

The  notion  of  executability  of  an  association  and  the  definitions  of  the  PDF  criteria 
can  be  relativized  in  a  straight  forward  manner. 

The  relationship  among  the  relativized  FDF  criteria  is  essentially  the  same  as  that 
among  the  non-relatived  criteria.  The  definitions  must  be  modified  to  reflect  the  fact  that 
the  objects  being  tested  now  consist  of  pairs  (P,IC)  where  Pisa  subprogram  and  IC  is  an 
input  constraint.  We  say  thai  relativized  criterion  Cj  includes  relativized  criterion  Ci  if 
for  every  subprogram/input-constraint  pair  (P,IC)  every  test  which  satisfies  Cj  for  that 
pair  also  satisfies  €2-  C]  strictly  includes  C2  if  Cj  includes  C2  and  for  some  pair 
(P.IC)  there  is  a  test  which  satisfies  C2  but  docs  not  satisfy  C\.  It  is  easy  to  show  that 
the  relationship  among  the  relativized  criteria  is  as  shown  in  Figure  4.1.2. 
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One  reasonable  choice  for  the  input  constraint  is  the  predicate  ]C^"^  obtained  by 
taking  the  constraints  on  the  input  to  the  program  as  a  whole  (drawn  from  the  program's 
specification),  conjoining  them,  and  "pushing  them  through"  the  program  to  all  points  at 
which  the  subprogram  being  tested  is  called.  In  practice,  one  might  want  to  use  a  weaker 
predicate  than  /C*''^  which  can  be  built  up  during  the  testing  process  as  follows.  Sup- 
pose that  at  some  point  in  the  testing  process  the  tester  notices  that  a  particular  execut- 
able association  has  still  not  been  exercised.  Upon  examining  the  program  to  see  what 
values  of  input  data,  non-local  variables  and  parameters  would  cause  the  execution  of 
that  association,  the  tester  sees  that  the  needed  values  of  non-local  variables  and  parame- 
ters cannot  arise  in  the  context  of  the  program  as  a  whole.  One  can  then  formulate  a  con- 
straint which  reflects  this  fact  and  conjoin  it  to  the  previous  constraint. 

If  the  calling  program  is  modified  some  time  after  the  subprogram  has  been  certified 
adequately  tested,  the  predicate  IC  will  provide  useful  documentation  which  will  help  in 
selecting  additional  test  data  for  the  subprogram.  If  IC  is  still  satisfied  whenever  the  sub- 
program is  called  then  no  further  testing  of  the  subprogram  will  be  needed.  If  IC  no 
longer  hold  at  the  points  of  call,  however,  it  will  be  necessary  to  update  IC,  determine 
which  def-use  associations  become  executable  relative  to  the  new  set  of  constraints,  and 
add  test  data  to  exercise  those  associations. 
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CHAPTER5: 

The  Path  Expression  Heuristic 

1.  Introduction 

In  order  to  determine  whether  a  test  set  T  satisfies  a  feasible  data  flow  testing  cri- 
terion C,  it  is  necessary  to  examine  all  of  the  definition-use  associations  required  by  C 
but  not  covered  by  T  to  see  whether  any  of  them  are  executable.  This  may  be  a  tedious 
and  error  prone  task. 

In  this  chapter  we  present  a  heuristic  called  the  path  expression  method  which 
attempts  to  determine  whether  a  given  definition-use  association  (d,u,v)  is  executable. 
There  are  two  "hard"  aspects  to  this  problem.  First,  there  may  be  infinitely  many  paths 
which  cover  the  association.  Second,  there  is  no  algorithm  to  determine  whether  a  partic- 
ular path  is  executable.  Our  approach  involves  using  regular  expressions  called  path 
expressions  to  describe  a  set  n  of  paths  which  could  potentially  be  executable  paths  cov- 
ering the  association,  and  using  a  combination  of  symbolic  evaluation  and  data  flow 
information  to  eliminate  from  consideration  subsets  of  U  consisting  of  paths  which  can 
be  shown  to  be  unexecutable  or  to  have  definitions  of  v. 

The  approach  is  heuristic  in  the  sense  that  it  will  either  answer  "association  is  unex- 
ecutable", or  "association  may  be  executable".  If  it  answers  "association  is  unexecut- 
able" then  the  association  is  guaranteed  to  in  fact  be  unexecutable.  If  it  answers  "associa- 
tion may  be  executable"  then  the  association  may  be  executable  or  unexecutable.  Hope- 
fully, for  "realistic"  programs,  it  will  only  infrequently  answer  "association  may  be  exe- 
cutable" when  the  association  is  actually  unexecutable.  When  it  answers  "association 
may  ho  executable"  il  will  also  supply  some  information  which  may  help  the  user  in 
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deiermining  by  hand  that  the  association  is  unexecutable,  or  in  selecting  a  test  case  which 
covers  the  association. 

Let  P  be  the  subprogram  being  tested.  Throughout  this  chapter,  unless  otherwise 
noted,  we  will  make  the  following  assumptions,  in  addition  to  the  assumptions  in  Chapter 
2,  section  1 : 

1 .  No  aliasing.  That  is,  we  assume  that  when  P  is  called,  all  of  the  actual  parameters 
corresponding  to  var  formal  parameters  are  distinct  from  each  other  and  distinct 
from  all  global  variables  which  occur  in  P.  We  also  assume  that  no  pointer  vari- 
ables occur  in  P.  We  will  relax  this  assumption  later. 

2.  No  hidden  side  effects.  That  is,  the  only  variables  visible  to  P  which  are  modified 
by  any  procedure  or  function  P'  called  by  P  are  those  variables  which  are  passed  to 
/"  as  var  parameters. 

3.  Functions  called  by  P  have  no  var  parameters. 

4.  P  has  no  repeat-until  statements. 

Assumptions  1  and  2  insure  that  the  data  flow  information  is  sufficiently  accurate. 
Assumption  3  simplifies  the  symbolic  evaluation  which  will  be  performed.  Assumption 
4  is  a  technical  convenience  which  insures  that  the  exit  from  each  loop  in  the  flow  graph 
of  P  is  at  the  top  of  the  loop.  In  section  5.6  we  discuss  the  effects  of  relaxing  these 
assumptions. 

Loops  in  the  flow  graph  will  play  an  important  role  in  the  path  expression  method. 
A  node  h  in  a  flow  graph  G  is  a  loop  header,  if,  when  a  depth  first  search  is  performed  on 
G,  starting  at  the  entry  node,  h  is  the  target  of  a  back  edge.  In  the  recursive  definition  of 
structured  flow  graphs  given  in  Figure  2.1.1,  the  loop  headers  are  the  nodes  labelled  with 
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the  lener  "i",  in  the  subgraphs  corresponding  to  for  statements  and  while  statements  (and 
the  entry  node  to  5 1  in  the  subgraph  corresponding  to  the  repeat-until  statement.)  If  (e,h) 
is  the  back  edge  whose  target  is  h*,  then  the  we  define  the  loop  corresponding  to  hxoht 
the  subgraph  of  G  consisting  of  h  along  with  those  nodes  n  such  that  there  is  a  path  from 
n  to  e  which  does  not  pass  through  h.  A  path  through  the  loop  headed  by  node  /z  is  a  path 
7t=  Oi,...,/*),  where  k>l,  /i=/;t=h,  and  ij  is  a  node  in  the  loop  headed  by  node  h,  for 
l<j<k.  Note  that  according  to  this  definition,  the  path  K=(h)  is  a  path  through  the  loop 
headed  by  node  h. 

We  begin  this  chapter  with  an  overview  in  Section  5.2  of  how  the  path  expression 
method  works.  In  section  5.3,  we  define  the  syntax  of  path  expressions  and  show  how 
they  can  be  used  to  represent  a  set  of  paths  through  a  flow  graph.  A  new  symbolic 
evaluation  technique  called  partial  symbolic  evaluation  of  path  expressions  is  described 
in  Section  5.4.  A  complete  description  of  the  method  is  given  in  Section  5.5.  Some  limi- 
tations and  enhancements  are  discussed  in  Section  5.6.  An  algorithm  for  the  derivation 
of  a  path  expression  representing  the  set  of  paths  from  a  given  node  to  a  given  node  or 
edge  is  presented  in  Appendix  11.  A  brief  description  of  our  (partial)  implementation  of 
the  path  expression  heuristic  and  a  script  of  a  "typical"  session  are  given  in  Appendix  III. 

2.  Intuition  Motivating  the  Path  Expression  Method 

In  this  section  we  introduce  the  path  expression  method  by  presenting  two  exam- 
ples. 


•  Note  that  for  siruclurcd  graph.s.  there  is  only  one  such  edge. 


-85- 

Example  5.2.1: 

We  begin  with  a  simple  example  based  on  a  common  program  fragment: 

begin 

done  :=  false; 
while  not  done  do 
begin 

SI; 

if  B  then  done  :=  true 

else  S2; 

S3; 
end; 
end; 

where  S1,S2,  and  S3  are  arbitrary  statements  and  B  is  a  boolean  expression.  The  flow 

graph  for  this  program  fragment  is  shown  in  Figure  5.2.1. 

Consider  the  definition-p-use  association  (l,(2,8),done).   Let  n  be  the  set  of  paths 
from  node  1  to  edge  (2,8).  We  can  represent  this  set  by  a  path  expression 

lp2-8. 
The  symbols  1  and  8  are  node  symbols  which  represent  nodes  1  and  8,  respectively.  The 
symbol  P2  is  a  loop  symbol  which  represents  the  set  of  paths  through  the  loop  headed  by 
node  2.  We  think  of  the  path  expression  as  meaning  "any  path  in  n  consists  of  node  1 
followed  by  some  arbitrary  path  through  the  loop  whose  header  is  node  2,  followed  by 
node  8."  This  representation  of  n  is  "high-level"  in  the  sense  that  it  omits  details  about 
what  happens  inside  of  the  loop. 

For  a  path  n  in  n  to  be  executable,  the  values  of  the  variables  must  satisfy  the 
expression  E=(Jo/ie=true)  when  n  exits  from  the  loop. 

Since  the  variable  done  is  set  to  false  in  node  1 ,  the  expression  E  cannot  be  satisfied 
by  the  values  of  the  variables  at  the  bottom  of  node  1.  Therefore,  along  any  executable 
path  from  1  to  8,  the  expression  E  must  change  value  within  the  loop. 
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done:=  false; 


Ta^  done:=  true; 


FIGURE  5.2.1 
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In  order  for  E  to  change  value,  one  of  the  variables  in  E  must  have  a  definition.  In 
this  case,  the  only  variable  in  E  is  done  so  we  can  eliminate  from  consideration  any  path 
along  which  done  has  no  definition.  But  we  are  looking  for  paths  which  are  definition 
clear  with  respect  to  done,  so  we  can  also  eliminate  any  paths  which  do  have  a  definition 
of  done.  We  have  eliminated  all  of  the  possibilities,  so  we  can  conclude  that  the  associa- 
tion is  unexecutable. 

Note  that  in  this  case  we  did  not  have  to  pay  attention  to  any  details  about  what  hap- 
pens within  the  loop.  In  other  cases  we  will  have  to  rewrite  the  path  expression  to  make 
it  express  more  details  about  what  happens  inside  of  loops.  In  either  case,  path  expres- 
sions provide  a  convenient  formalism  which  helps  us  to  look  at  the  problem  at  "the  right 
level  of  detail". 

A  similar  situation  arises  from  a  program  fragment  consisting  of  a  for  loop  in  which 
the  upper  bound  is  always  greater  than  or  equal  to  the  lower  bound.  If  the  counter  vari- 
able X  is  set  equal  to  the  lower  bound  in  node  i  and  if  edge  (j,k)  is  the  exit  from  the  loop, 
then  the  def-use  association  (i,(j,k),x)  is  unexecutable.  An  analysis  analogous  to  the  one 
above  can  be  used  to  prove  this.  These  two  common  "paradigms"  of  unexecutable  asso- 
ciations can  by  shown  to  be  unexecutable  by  a  straight-forward  application  of  the  path 
expression  method. 

Example  5.2.2: 

Our  next  example  is  somewhat  more  complicated.  Consider  the  association 
(2,(6,14),switch)  in  the  "bubble-son"  program  shown  in  Figure  5.2.2.  The  set  of  paths 
from  node  2  to  edge  (6,14)  can  be  represented  by  the  path  expression 

a=2-p3-5p6l4. 


^<i- 


^ 
^ 


switch:=  true; 


read(a[ij) 


n:-a-  1 


FK.l  RK  52.2 
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We  think  of  a  as  saying  that  any  path  from  node  2  to  edge  (6,14)  stans  at  node  2,  then 
takes  an  arbitrary  path  through  the  loop  headed  by  node  3,  then  goes  to  node  5,  then  takes 
an  arbitrary  path  through  the  loop  headed  by  node  6,  then  ends  at  node  14.  In  order  to 
execute  branch  (6,14),  the  expression 

E=  ((«  <  2)  or  {switch  =  false)) 
must  be  satisfied  by  the  values  of  the  variables  at  the  bottom  of  node  6.  Since  the  values 
of  the  variables  never  satisfy  E  when  control  reaches  the  bottom  of  node  5,  we  know  that 
on  any  executable  path  at  least  one  of  the  variables  in  the  set  [n,switch}  must  change 
value  in  the  loop  headed  by  node  6. 

At  this  point  we  need  more  information  about  the  paths  through  the  loop  headed  by 
node  6.  We  replace  the  symbol  p^  by  a  sub-path-expression  which  includes  details  about 
the  body  of  this  loop.  We  obtain 

a'=2p3-5(6-7-p8l3)*614. 
We  think  of  a'  as  adding  the  information  that  any  path  through  the  loop  headed  by  node 
6  consists  of  zero  or  more  subpaths  which  start  at  node  6,  go  to  node  7,  take  an  arbitrary 
path  through  the  loop  headed  by  node  8,  and  end  at  node  13;  followed  by  a  final  visit  to 
node  6. 

Since  we  are  looking  for  def-clear  paths  with  respect  to  switch  we  can  eliminate  any 
path  along  which  switch  is  always  redefined.  Since  switch  has  a  definition  in  node  7,  the 
only  definition  clear  paths  with  respect  to  switch  are  those  represented  by  the  path 
expression 

a"=2p3-5-6  14 
a"  says  that  the  only  path  through  the  loop  headed  by  node  6  which  we  need  to  consider 
is  the  path  consisting  of  the  single  node  6.  But  none  of  the  variables  in  {n,switch]  are 
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defined  in  node  6,  E  cannot  change  value  in  the  loop,  so  the  definition-use  association  is 
unexecutable. 

Overview  of  the  path  expression  method 

We  now  give  a  rough  overview  of  the  path  expression  method.  The  ideas  presented 
here  will  be  made  more  precise  in  the  following  sections. 

Given  an  association  (d,u,v)  we  begin  by  deriving  a  path  expression  a  which 
represents  the  set  of  paths  from  d  to  u,  and  which  omits  details  about  paths  through  loops. 
The  path  expression  has  node  symbols  which  represent  single  nodes  and  loop  symbols 
which  represent  sets  of  paths  through  loops.  As  the  algorithm  progresses  we  use  a  com- 
bination of  symbolic  evaluation  and  data  flow  analysis  to  eliminate  from  consideration 
paths  which  can  be  shown  to  be  unexecutable  or  which  can  be  shown  to  have  definitions 
of  v. 

For  each  loop  symbol  occurring  in  a  we  use  symbolic  evaluation  to  attempt  to  find  a 
collection  of  sets  of  variables,  {5i,...,5i}  such  that  for  each  h,  at  least  one  of  the  vari- 
ables in  the  set  5/,  must  be  redefined  within  the  loop  on  any  executable  path  covering 
(d.u.v).  We  then  replace  the  loop  symbol  by  a  sub-expression  a'  which  includes  more 
details  about  the  body  of  the  loop,  a'  represents  the  set  of  definition-clear  paths  with 
respect  to  v  through  the  loop.  If,  for  some  h,  none  of  the  variables  in  S/,  have  definitions 
on  any  of  the  paths  represented  by  a'  then  there  are  no  executable  paths  in  the  set  of 
paths  represented  by  a,  so  the  association  is  unexecutable. 

The  algorithm  terminates  when  cither 
1.       ihc  current  path  expression  represents  the  empty  set  of  paths  (in  this  case  the  associ- 
ation is  unexecutable). 


-91- 

or 

2.  the  current  path  expression  contains  no  more  loop  symbols  (in  this  case  the  associa- 
tion may  be  executable). 

3.  Definition  of  Path  Expressions 

We  now  define  the  syntax  and  semantics  of  path  expressions*.  Let  P  be  a  procedure 
with  flow  graph  G. 

Let  Z  =  {n,  I  n,  €  Nodes(G)}  u  {p,  I  «,  is  a  loop  header  node  in  G).  We  will  refer  to  the 
symbols  n,  as  node  symbols  and  the  symbols  p,  as  loop  symbols.  We  will  say  that  the 
symbols  p,  and  n,  correspond  to  the  node  «,-.  Let  Q  =  {  (,  ),  *,  •,  [^,  X,  0  ).  A  path 
expression  is  a  regular  expression  e  (ZuQ)*  defined  in  the  usual  way  [DAV83]. 
Specifically, 

1.  X  is  a  path  expression. 

2.  0  is  a  path  expression. 

3.  Each  element  of  Z  is  a  path  expression. 

4.  If  a  and  P  are  path  expressions  then  so  are 
i.  (ap) 

ii.  (auP) 
iii.  a* 


•  Tarjan  [TAR81A,TAR81B]  and  Wegman  [WEG83],  and  others,  have  also  used  regular  ex- 
pressions to  represent  sets  of  paths  through  flow  graphs.  Our  formalism  differs  somewhat  from 
theirs,  in  that  we  exploit  the  structure  of  the  flow  graphs,  placing  special  emphasis  on  loop 
headers.  By  so  doing,  we  loose  some  generality,  but  gain  the  ability  to  ignore  details  about  the 
internal  structure  of  loops  until  they  arc  needed. 
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5.  Only  those  strings  formed  by  a  finite  number  of  applications  of  rules  1  through  4 
are  path  expressions. 

We  associate  a  set  PATHS  (a)  of  sequences  of  nodes  with  each  path  expression  a  as 
follows: 

1.  PATHS (X)  =  (X},  the  set  whose  only  element  is  the  null  path. 

2.  PATHS(0)  =  0,  the  empty  set. 

3.  PATHS(n,)  =  {az,  ),  the  set  whose  only  element  is  the  path  of  length  one  whose 
only  node  is  Oj. 

4.  PATHS(p,)  =  {a  I  a  is  a  path  which  begins  and  ends  with  node  «,  and  which  con- 
tains no  nodes  which  are  not  in  the  loop  headed  by  n, } 

5.  PATHS(a-p)  =  PATHS(a)PATHS(p)=  (opIaePATHSCa)  and  pePATHSCp)) 

6.  PATHS(ai^p)  =  PATHS(a)  ^  PATHS(p) 

7.  PATHS(a*)  =  (PATHS(a))* 

If  S  is  a  set  of  paths  and  a  is  a  path  expression,  we  will  say  that  a  represents  S  if  and 
only  if  PATHS (a)=S. 

A  path  expression  a  is  valid  if  every  element  of  PATHS(a)  is  actually  a  path  in  G. 
That  is,  a  is  valid  if  and  only  if 

X\-  ...  Xk  G  PATHS(a)  =>  (x,.j:,>])  is  an  edge  in  G  for  1  <  i  <  k  -  1. 

We  will  say  that  two  valid  path  expressions,  a  and  P  are  equivalent,  denoted  a  =  P  if  and 
only  if  PATHS(a)  =  PATHS(P). 

As  a  notational  convenience  we  eliminate  extraneous  parentheses,  adopting  the  con- 
vention that    *    has  precedence  over  "•"  which  has  precedence  over   \j      ^'^ c  will 
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sometimes  use  the  numeral  "i"  written  in  bold-face  in  place  of  n,. 

We  adopt  the  notational  convention  of  using  lower  case  Greek  letters  near  the 
beginning  of  the  alphabet  to  represent  path  expressions,  and  lower  case  Greek  letters  near 
the  end  of  the  alphabet  to  represent  paths. 

If  a  is  a  substring  of  P  and  a  is  a  path  expression  we  will  say  that  a  is  a  sub-path- 
expression  of  |3. 

It  will  be  necessary  to  distinguish  between  sub-path-expressions  and  occurrences  of 
sub-path-expressions.  For  the  sake  of  clarity,  we  will  represent  path  expressions  as 
binary  trees  in  the  obvious  way.  We  will  refer  to  the  nodes  of  these  binary  trees  as  ver- 
tices in  order  to  avoid  confusing  them  with  nodes  of  the  procedure's  flow  graph.  Each 
vertex  of  the  binary  tree  will  be  labeled  with  a  symbol  from  ZuQ.  Each  leaf  vertex  will 
be  labelled  with  a  node  symbol,  a  loop  symbol,  "k  or  0.  Each  intemal  vertex  will  be 
labelled  with  *,  •,  or  ^^.  Vertices  with  the  symbols  •  and  ^^  will  have  two  children, 
while  vertices  with  the  symbol  *  will  only  have  a  left  child. 

The  path-expression  corresponding  to  a  tree  T  can  easily  be  recovered  by  perform- 
ing an  in-order  traversal  of  T: 

function  PathExpr(T:  tree):  path-expression; 
begin 

if  T  is  the  null  tree  then  return  null  path-expression 

else 

begin 

sym  :=  label(root(T)); 
ifsyme  {"•","*", "<^")  then 

return(CONCAT("(",PathExpr(left-subtree),sym,PaihExpr(right-subtree),")")) 
else  return(sym) 
end 
end; 

Let  T  be  a  tree  representing  some  path  expression  a.  Then  each  subtree  of  T 
corresponds  to  an  occurrence  of  a  sub-path-expression  of  a.  We  will  sometimes  denote  a 
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subtree  of  T  which  corresponds  to  a  sub-path-expression  P  by  T^ .  The  notation  (T°T^) 
will  denote  a  tree  whose  root  is  labelled  with  the  symbol  •,  whose  left  subtree  is  7°  and 
whose  right  subtree  is  T^.  Similarly,  (T°^\^T^)  will  denote  a  tree  whose  root  is  labelled 
with  the  symbol  \^,  whose  left  subtree  is  7"  and  whose  right  subtree  is  T^;  the  notation 
(7°)*  will  denote  a  tree  whose  root  is  labelled  with  the  symbol  *,  whose  left  subtree  is 
7"  and  whose  right  subtree  is  null.  We  will  sometimes  omit  parentheses  when  it  is 
unecessary  to  distinguish  between  details  of  the  structure  of  the  trees.  For  example, 
7a.7P.7Y  ^,iii  denote  either  ((7°T^)T^)  or  (7"-(7Pt^)).  The  symbol  T'^+T^  will 
denote  the  set  of  trees  {7"-7P,  7"^^7^  7P<^7"} 

Let  7°  be  a  subtree  of  7^.  We  will  say  that  tc  is  a  T^-subpath  of  p  if  and  only  if 
ne  PATHS(a),  pe  PATHS(Y),  Ji  is  a  subpaih  of  p  and  either 

1 .  7^  7°-h7^  or  7^€  7^+7°,  or  7^6  (7°)*  or 

2.  71  is  a  7°-subpath  of  a  and  o  is  a  7'^-subpath  of  p. 

Intuitively,  if  7i  is  a  T^-suhpaih  of  p,  7t  is  the  portion  of  p  which  comes  from  the 
occurrence  of  the  sub-path-expression  a  of  y  corresponding  to  the  subtree  7". 

Let  7]  and  Ti  be  trees,  where  T2  may  be  a  subtree  of  7;.    We  define  the  72- 
reduction  of  7]  to  be  the  path  expression  computed  by  the  following  function: 
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function  Reduce(7"i,r2):  path-expression; 
begin 

if  Ti  is  the  null  tree  then  return  (null  path-expression) 

else 

begin 

sym  :=  label(root(7i)); 

if  syme  {Node-symbols, Loop-symbols}  then  retiim(sym) 

else 

begin 

L  :=  left  subtree  of  Tj ; 
R  :=  right  subtree  of  7"  i ; 
ifsym=  "  'then 

return(CONCAT(Reduce(L,r2)."-".Reduce(R,7"2))); 
if  sym  =  "[^"  then 

if  T2  is  a  subtree  of  L  then  retum(C0NCAT("-",Reduce(L,r2))) 
else  if  72  is  a  subtree  of  R  then  retum(C0NCAT("-",Reduce(R,r2))) 

elsereturn(CONCAT(Reduce(L,r2)/'u"'^^"'^^<^'^2))) 
if  sym  =  "*  "  then 

if  7^2  is  a  subtree  of  L  then  return(C0NCAT("-",Reduce(L,r2))) 

elsereturn(CONCAT("(",Reduce(L,7-2),")"/'*")); 
end; 
end; 
end; 

Let  Y  be  the  T^-reduction  of  7"  where  T^  is  a  subtree  of  7°.  Then  y  is  a  path 
expression  which  represents  a  subset  of  PATHS(a)  consisting  entirely  of  paths  having 
7P-subpaths.  Furthermore,  7^5p-e  where  5  and  e  are  path  expressions.  Note  that  if 
there  is  no  subtree  7^  of  7"  such  that  7^  is  a  subtree  of  7^  and  the  root  of  7^  is  labelled 
with  \j"  or  "*",  then  the  7'^-reduction  of  7"  is  a,  itself. 

We  associate  with  each  valid  path  expression  a  three  sets  of  variables, 
ALWAYS(a),  SOMETIMES(a)  and  NEVER(a)  which  indicate  the  variables  which 
have  definitions  along  all,  some,  or  none  of  the  paths  in  PATHS  (a).  We  define  these  sets 
recursively  as  follows: 

Let  VAR_SET  be  the  set  of  all  variables  visible  to  the  procedure  P. 
ALWAYS(A.)  =  0 
SOMETIMES(X)  =  0 
NEVER(X.)  =  VAR_SET 
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ALWAYS(0)  =  VAR_SET 
SOMETIMES(0)  =  0 
NEVER(0)  =  VAR_SET 

ALWAYS(n,)  =  {v  g  VAR_SET  I  v  has  a  global  definition  in  node  i) 

SOMETIMES(n,)  =  ALWAYS(n,) 

NEVER(n,)  =  {v  e  VAR_SET  I  v  does  not  have  a  global  definition  in  node  i} 

ALWAYS(p,)  =  (v  G  VAR_SET  I  v  has  a  global  definition  on  every  path 

inPATHS(p,)} 
SOMETIMES(p,)  =  {v  6  VAR_SET  I  v  has  a  global  definition  on  some  path 

in  PATHSfp,)} 
NEVER(p,)  =  {v  G  VAR_SET  I  v  has  a  global  definition  on  no  path 

inPATHS(p,)) 

ALWAYS((ap))  =  ALWAYS(a)  u  ALWAYS(P) 
SOMETIMES((ap))  =  SOMETIMES(a)  u  SOMETIMES(p) 
NEVER((a-p))  =  NEVER(a)  n  NEVER(p) 

ALWAYS((ai^3))  =  ALWAYS(a)  n  ALWAYS(p) 
SOMETIMES((a,^P))  =  SOMETIMES(a)  u  SOMETIMES(p) 
NEVER((a^^P))  =  NEVER(a)  n  NEVER(p) 

ALWAYS{a*)  =  0 
SOMETIMES(a*)  =  SOMETIMES(a) 
NEVER(a*)  =  NEVER(a) 

It  is  straight  forward  to  show  that  the  following  properties  hold  for  all  path  expres- 
sions a  which  are  not  equivalent  to  0: 

1 .  ALWAYSCa)  c  SOMETIMES(a) 

2.  NEVER(a)  =  COMPLEMENT(SOMETIMES(a)) 


V    G 


ALWAYS  (a) 

SOMETIMES  {ai     if  and   only   if  v   has   a   global   definition   on 


all 

some 

no 


NEVER  (a) 
ne  PATHS(a). 

4.      If  a  =  p  then  ALWAYS(a)=ALWAYS(p).  SOMETIMES(a)=SOMETIMES(P). 
and  NEVER(a)=NEVER(p). 
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4.  Partial  Symbolic  Evaluation  of  Path  Expressions 

Symbolic  evaluation  techniques  aim  to  derive  algebraic  expressions  summarizing 
the  effect  of  executing  a  path  or  set  of  paths  through  a  program.  Symbolic  execution 
derives  an  expression  representing  a  single  path  through  a  program.  Global  symbolic 
evaluation  attempts  to  derive  an  expression  representing  an  entire  program. 

Most  symbolic  execution  methods  attempt  to  derive  both  a  path  condition  describ- 
ing the  set  of  input  values  which  cause  the  path  to  be  executed,  and  a  path  computation, 
describing  how  the  values  of  variables  are  transformed  when  the  path  is  executed. 
Several  symbolic  execution  systems  have  been  described  in  the  literature  and  have  been 
implemented  [KIN76,HOW78b,CHE79,CLA81,KEM85].  Global  symbolic  evaluation  is 
a  much  more  difficult  technique,  the  goal  of  which  is  the  derivation  of  an  expression 
which  summarizes  the  entire  effect  of  a  procedure.  Its  development  has  been  impeded  by 
serious  theoretical  and  practical  obstacles. 

In  this  section,  we  describe  a  new  symbolic  evaluation  technique,  partial  symbolic 
evaluation  of  path  expressions.  This  method  is  more  general  than  symbolic  execution, 
yet  essentially  no  more  expensive.  It  can  provide  some  but  not  all  of  the  information 
which  global  symbolic  evaluation  would  provide. 

We  begin  with  a  formal  description  of  symbolic  execution.  We  then  present  an 
example  of  symbolic  execution.  This  is  followed  by  a  formal  description  of  partial  sym- 
bolic evaluation  of  path  expressions  followed  by  an  example.  Finally,  we  give  a  brief 
description  of  global  symbolic  evaluation  and  compare  the  three  techniques.  Initially  we 
will  assume  that  there  are  no  procedure  or  function  calls  along  any  of  the  paths  being 
symbolically  evaluated.  We  will  relax  this  assumption  later. 
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The  systems  described  in  the  literature  symbolically  execute  paths  from  the 
program's  entry  to  its  exit.  Our  heuristic  method  for  attempting  to  determine  whether  a 
definition-use  association  is  executable  involves  deriving  path  conditions  and  path  com- 
putations for  arbitrary  paths.  We  generalize  symbolic  execution  in  a  straight-forward 
way  to  allow  for  this. 

Let  P  be  a  procedure  and  let  V ■[,... ,V„  be  the  program  variables  which  are  visible  to 
P  (i.e.,  local  variables,  formal  parameters  and  non-local  variables).  Let  T^,...,T„  be  the 
types  of  V'],...,V'„,  respectively;  let  S,  be  the  set  of  values  which  can  be  assumed  by  vari- 
ables of  type  7",;  and  let  U  =  Sj  u  ...  u  5„.  For  1  <  i  <  n,  let  5/  =  S,  u  (T)  where  we 
think  of  "T"  as  denoting  the  value  "undefined".  Let  5'„  +  i  =  U*,  the  set  of  finite 
sequences  of  elements  of  U. 

We  define  the  state  space  ofP  to  be  the  set 

S  =  Si    X  ...  X  S  „+i. 

A  state  V  =  (v'l  ,...,v„^i)  is  an  element  of  S.   If  v  is  a  state,  we  think  of  V] v„  as 

representing  the  values  of  the  variables  V ^ V^  and  of  v„+i  =  {X\,X2,...)  as  represent- 
ing the  sequence  in  the  input  file.  For  notational  convenience  we  are  assuming  that  P 
gets  its  input  from  a  single  file  input.  If  P  took  input  from  k  >  1  files,  we  could  let 
•S'„*2 S'„.,i=U*  represent  these  files,  and  let  S  =5/  x  ...  x5'„+4. 

To  each  path  n  in  P  there  corresponds  a  subset  D"  of  S  called  the  path  domain  ofn 
defined  by 

D"  =  |v  e  S  I  if  at  the  top  of  path  n  the  values  of  variables  V'j V,,  are  V] v„. 

respectively,  and  the  sequence  of  values  in  ilic  input  file  is  \-„,\  then  path  re  will  be 
executed) 
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and  a  function  f-  =(f\,...J^^\ ) :  D"  ^  S  which  transforms  an  initial  state  to  di  final  state. 

The  goal  of  symbolic  execution  is  the  derivation  of  a  predicate  PC^{\),  called  the 
path  condition,  such  that 

PC"(v)  holds  if  and  only  if  v  €  D" 
and  a  predicate  C(9MP"(v,w)=C(9M/'7(v,w)&  ...  &COMP;j(v,w),  called  the  path  com- 
putation, such  that 

COM/'f  (v,w)  holds  if  and  only  if  w,  =f-{y) 

Thus,  the  path  condition  of  it  describes  the  path  domain  of  7t  and  the  path  computation 
describes  the  way  the  values  of  variables  are  transformed  when  7t  is  executed. 

Note  that  if  /'C"=FALSE  then  D"  is  empty,  so  7t  is  unexecutable.  However  the 
converse  does  not  hold.  Recall  that  a  path  is  executable  if  it  is  a  subpath  of  a  complete 
executable  path.  Thus,  if  the  only  complete  paths  having  7:  as  a  subpath  are  unexecut- 
able, then  71  is  unexecutable,  even  if  FC"  is  satisfiable.  If  7t  is  a  complete  path,  then  n  is 
executable  if  and  only  if  fC"  is  satisfiable  if  and  only  if  Z)"  is  non-empty. 

We  now  define  the  predicates  COMP"^  and  PC".  We  will  first  define  COMP^  and 
PC^  when  7:  consists  of  at  most  one  node.  We  will  define  COMP^^  in  terms  of  COMP^ 
and  COMPP  and  define  PC"P  in  terms  of  PC,  PC  P  and  COMP"" . 

Let  Ti  =  X,  the  null  path.  Then  PC^Cv)  =  TRUE  and  COMP""  (v,w)  =  (v  =  w). 

Let  7C  =  (ni)  be  a  path  consisting  of  a  single  node.  Then  PC"  =  TRUE  and 
COA/f  "(v,w):  is  the  predicate  computed  by  the  following  algorithm: 


type  Symbolic^ Value  :  character  string; 

SymboIic_Sequence  :  sequence  of  characters; 
Symbolic_State  : 
record 
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v:  Symbolic_Value; 

in:  Symbolic_Sequence; 
end; 
{To  avoid  confusion  we  will  refer  to  variables  used  in  this  algorithm  as  "variables"  and 
to  variables  occurring  in  the  procedure  which  is  being  symbolically  executed  as  "pro- 
gram variables".  Notation:  we  will  denote  variables  of  type  Symbolic_State  by  bold-face 
letters  and  their  fields  by  subscripted  letters.  For  example  if  x  is  a  variable  of  type 
Symbolic_State  then  X],...,Xn  stand  for  x.v[l],...,x.v[n]  which  represent  the  symbolic 
values  of  the  program  variables  Vi,...,V„;  x^+i  stands  for  x.in  which  represents  the  sym- 
bolic value  of  the  input  file.  We  assume  that  the  elements  of  the  sequence  Xn+i  are  dis- 
tinct from  one  another. } 

function  Symbolically_Execute_Node  (n:  Node;v,w:Symbolic_State):  Predicate; 
(Remms  an  expression  representing  the  predicate  COMP^{\,w)} 
var  X  :  Symbolic_State; 
begin 

for  i  :=  1  to  n-t-1  do  a:,  :=  "v,"; 

for  sc  :=  1  to  (number  of  statements  in  block)  do 

begin 

stmt  :=  sc'^  statement; 
Symbolically_Execute_Statement(stmt,x); 

end; 

for  i  :=  1  to  n  do 

COMPf  :=  CONCAT("h'  ",  "='\x,y, 

COMP''  :=  COMPOS:. ..SiCOMPl 

returnCCOA/P"): 
end;  {Symbolically_Execute_Node) 

procedure  Symbolically_Execute_Statement  (stmt  :  StatementType;  var  x  :  Symbolic_State); 
begin 

if  stmt  is  an  input  statement, 

"read(V,  J  ,...,\',^;"  then 
begin 

j:,,     :=Head(x„+i); 

A^.i  :=Tail(A:„+i); 

x,^    :=Head(x„  +  ]); 

Xr,^x  :=Tail(.x„.i); 
end; 
if  stmt  is  an  assignment  statement, 

"\\  :=  cxpr" 
then    .t,    :=  Symbolically-Evaluate-Expr(cxpr.x); 
end.  ( S ymKilically-Evaluatc-Staiemcnt } 

fiiiuiion  S\nilK)licall\-l  \<iluale-I'\pr(expr,x):  character_string; 

( Returns  the  sinng  tonncd  hy  replacing  each  occurrence  of  program  variable  \\  in  cxpr 
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by  the  value  of  variable  x, .  For  example,  if  expr  h  "V^+V2"  and  the  values  of  Xi  andj:2 
are  "V3*V4"  and  "5",  respectively,  then  Symbolically-Evaluate-Expr  returns  the  character 
string  "v3*V4+5".  This  could  be  defined  formally  in  terms  of  functions 
Symbolically_Execute_Term  and  Symbolically_Execute_Factor  and  a  parse  of  the 
expression.  ) 


Let  o  =  np  be  the  concatenation  of  the  paths  n  and  p.  The  function  /°  correspond- 
ing to  o  is  the  composition  of  the  functions/"  and/*'  corresponding  to  the  paths  n  and  p. 
Thus, 

COMP'^iwWf)  =  COMP''{\y)  &  COMP^{\\\v). 

LetBC'^(Vi,...,V''„)  be  the  branch  condition  on  the  edge  between  the  last  node  of  ;t 
and  the  first  node  of  p.  Then 

PC^Cv)  =  PC^Cv)  &  COMP^Cv.v')  &  PC^iy')  &.  BC'Piy'). 

It  is  clear  that  the  predicates  COMP^  and  PC^  satisfy  the  desired  propenies. 

We  can  summarize  all  of  the  information  obtained  from  symbolically  executing  a 
path  7t  in  the  predicate  5£"(  v,w  )  given  by 

5£"=/'C"&C0M/'". 

Thus,  (v,w)  satisfies  S£"(v,w)  if  and  only  if  v  e  D"  and  /"  transforms  v  into  w. 

These  recursive  definitions  of  COMP^  and  PC^  lead  in  an  obvious  way  to  a  non- 
deterministic  recursive  algorithm  for  computing  the  predicates: 

iflength(Tt)  =  0  then  CO\fP'':=COMP^ 
else  if  length(7r)  =  1  then 

compute  COMP^  by  above  algorithm 
else 
begin 

let  7t=op  where  length(o),length(p)>l; 

compute  COMP^  from  COMP^  and  COMP^  according  to  the  definition 
end; 
Specifying  a  particular  way  to  decompose  n  into  a  and  p  gives  a  deterministic  algorithm. 
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If  on  each  recursive  call  n  is  decomposed  in  such  a  way  that  length(a)=l  the  algorithm 
which  results  is  the  forward  expansion  algorithm  described  in  the  literature.  If  n  is 
decomposed  in  such  a  way  that  length(p)=l  the  algorithm  which  results  is  similar  to  the 
the  backward  expansion  algorithm  described  in  the  literature. 

We  now  present  an  example  of  symbolic  execution.  Consider  the  bubbleson  pro- 
gram shown  in  Figure  5.2.2.  We  will  denote  the  initial  values  of  the  variables  i,  a,  n. 
imp,  switch,  and  inpui  by  /°,  a°,  n^,  tmp^,  switch^,  and  input^  respectively,  and  the  final 
values  by  /^,  a^,  n^,  tmp^ ,  switch^,  and  input^  respectively.  Thus  we  will  be  trying  to 
derive  the  expression 

5£"(  i^ ,a^ ,n^ ,rmp'^ ,switch^ ,inpui^ ,  i^,a^,n^,tmp^,switch^,input^) 

Let  7t  be  the  path  (6,7,8,9,10,12,8,13,6,14).  We  initialize  the  path  condition  to 
TRUE.  Figure  5.4.1  shows  the  symbolic  values  of  the  variables  and  the  path  condition 
after  symbolic  execution  of  each  subpath.  We  have  only  shown  those  variables  which 
are  changed  by/"  or  which  are  involved  in  PC".  The  entry  in  the  row  for  subpath  pn,  in 
the  column  labelled  BC  is  the  predicate  BC'"''(w)  &  CO\fpP(\^,w). 

To  obtain  COMP^  we  set  the  final  values  of  variables  equal  to  the  symbolic  values. 
To  obtain  the  path  condition  we  conjoin  the  predicates  BC.  Conjoining  COMP^'  and 
PC"  gives 

SE^{  i^ ,a^ ,n^ ,rmp^  jwitch^ ,inpui^ ,  i^,a^,nf,tmp^,swiich^,input^)  = 
(a°|Il>a"|21)    &    (n"=2)    &    (switch^)    &    (c2^[\]^i^\2])    &    {a^[2]=<2^[\])    & 
(fl^[3)=a"(3I)  &  (a^|4]=u"l4])  &  (a^[5]=a^\5])  &  (i^=2)  &  (n^=l)  &  (s^^iich^)  & 
{inpui^=inpui^) 

Thus,  symbolic  execution  of  n  shows  that  in  order  for  n  to  be  executed,  the  value  of 
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n  must  be  equal  to  2,  the  value  of  swiich  must  be  TRUE,  and  the  value  of  a[l]  must  be 
greater  than  the  value  of  a[2]  at  the  beginning  of  the  path.  After  executing  such  a  path, 
the  value  of  /  will  be  2,  the  value  of  a[l]  and  a[2]  will  have  been  swapped,  and  the  value 
of  n  will  be  1. 
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We  now  describe  partial  evaluation  of  path  expressions.  Let  a  be  a  valid  path 
expression.  The  goal  of  partial  symbolic  evaluation  of  a  is  the  derivation  of  the  predi- 
cates PCOMP'^i\,w),  the  partial  path  compulation,  and  PPC^(y),  the  partial  path  con- 
dition. Intuitively,  PPC  will  partially  describe  the  common  features  of  the  path  domains 
of  paths  in  PATHS(a)  and  PCOMP  will  panially  describe  the  common  features  of  path 
computations  corresponding  to  paths  in  PATHS(a). 

Let  (3  be  a  sub-path-expression  of  a  consisting  only  of  node  symbols,  concatenation 
symbols,  lambdas  and  parentheses.  Then  PATHS(p)  consists  of  a  single  path,  n.  In  this 
case,  partial  symbolic  evaluation  of  P  will  be  essentially  the  same  as  symbolic  execution 
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of  n.  On  the  other  hand,  if  p  contains  loop  symbols,  union  symbols,  or  stars,  then 
PATHS(P)  will  have  more  than  one  element.  Unlike  global  symbolic  evaluation,  panial 
symbolic  evaluation  will  not  attempt  to  consider  all  of  the  details  of  each  of  the  (possibly 
infinitely  many)  elements  of  PATHS(P).  Rather,  it  will  assign  a  new  symbolic  value  to 
any  variable  which  has  a  global  definition  on  at  least  one  path  in  PATHS(p).  Intuitively, 
if  there  is  more  than  one  way  in  which  the  value  of  a  variable  can  be  changed  by  the 
functions  corresponding  to  PATHS(P),  then  the  value  of  that  variable  is  unknown  after 
partially  symbolically  evaluating  p.  A  precise  definition  of  partial  symbolic  evaluation 
follows. 

We  define  PCOMP^{\\w)  recursively  as  follows: 

1.  If  a  =  n,  or  a  =  X  then  PCOMP°-(y,yi)  =  COMP^Cv.w)  where  7i  is  the  unique  ele- 
ment of  PATHS(a). 

2.  Ifa  =  0thenPCOM/'"(v,w)  =  FALSE. 

3.  Forae  {p.,  (PuyX  P*) 

Let  SOMETIMES(a)  =  (V,, V,J.  Then 

f  u',  =UNIQUE  NEW  SYMBOL      if  /  e  fi  i  ij 

I  w\  =  V,  otherwise 

4.  If  a  =  (3y)  then  PCOMP°(v,w)  =  PCOMP^{\y')  &  PCOMP^(\',vf). 

The  definition  of  PPC^i\)  for  a  path  expression  a  is  similar  to  the  definition  of 
PC^iWw)  for  a  path  n. 

1 .  If  a  is  a  single  symbol  n,  .p, .  X.  or  a  =  (P^^/)  or  a  =  p*  then 
P/'r°{v)  =  TRUE. 

2.  If  a  IS  0  then  PPC°(\)  s  FALSE. 
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3.      If  a  is  (Py)  then 

PPCiy)  =  PPC^iv)  &  PCOMP^{\y)  &  PPCH\')  &  BC^Hxy, 

We  summarize  all  of  the  information  obtained  from  partially  symbolically  evaluat- 
ing a  path  expression  a  with  the  predicate  /'S£'°(v,w)  given  by 

P5£°  =  PPC^  &  PCOMP"^. 

We  now  present  an  example  of  partial  symbolic  evaluation.  Consider  again  the 
bubbleson  program  shown  in  Figure  5.2.2.  Let  a  be  the  path  expression 

6-7p8l3  6  14. 
We  initialize  the  symbolic  values  of  the  variables  /,  a,  n,  and  switch  to  /  ,  a°,  n  ,  and 
switch^,  respectively,  and  we  initialize  the  path  condition  to  TRUE.  Figure  5.4.2  shows 
the  symbolic  values  of  the  variables  and  the  path  condition  after  executing  each  subpath. 
Note  that  in  symbolically  executing  the  loop  symbol  pg  we  have  given  new  symbolic 
values  to  the  variables  a,  switch,  and  i  because  each  of  these  variables  is  potentially 
redefined  in  the  loop  headed  by  node  8. 

Setting  the  final  values  of  variables  equal  to  the  symbolic  values,  conjoining  the 
entries  in  the  column  labelled  BC,  and  simplifying,  we  get 

PSE'^ii^ ,a^ ,n^ , switch^ Mpui^ ,  i^^ ,n^, switch^, input^  = 

{af=a')  &  (i^=i')  &  («^=rt°-l)  &  {,swUchf=switch')  &  n°>2  &.  switch^  &  r>n°-\ 

&  {n^-\<2v  noi  switch') 

=  n^>2&  switch^  &  i^  >  n^-\  &  (n^=3  v  not  switch^) 

Thus,  partial  symbolic  evaluation  of  a  tells  us  that  in  order  for  a  path  in  PATHS(a) 
to  be  executed,  the  value  of  n  must  be  greater  than  or  equal  to  2  and  the  value  of  switch 
must  be  TRUE  at  the  beginning  of  the  path;  after  executing  such  a  path,  the  value  of  / 
will  be  strictly  greater  than  one  less  than  the  initial  value  of  n;  furthermore,  if  the  initial 
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value  of  n  was  not  equal  to  3,  then  the  final  value  of  switch  must  be  FALSE.  This  is 
some,  but  not  all  of  the  information  which,  in  theory,  global  symbolic  evaluation  of  the 
set  of  paths  might  provide. 


subpath 
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a° 

/" 

n' 
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/' 
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n°-l 

i'>n°-\ 
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6  7p,  13-614 

(n°-l  <2)  vnoi.rK7fc/i' 

Figure  5.4.2 

For  comparison,  we  will  now  briefly  describe  global  symbolic  evaluation.  The 
descriptions  of  global  symbolic  evaluation  in  the  literature  deal  with  global  symbolic 
evaluation  of  an  entire  routine.  The  goal  is  to  derive  a  single  expression  which  summar- 
izes the  effect  of  executing  any  path  through  the  routine.  It  is  straight  forward  to  general- 
ize this  to  global  symbolic  evaluation  of  a  path  expression.  Given  a  path  expression  a 
the  goal  of  global  symbolic  evaluation  is  the  derivation  of  a  predicate  G5£°(v,w)  given 
by 

GSE°^  =        V       S£". 

ne  PATHS  (a) 

(v.w)  satisfies  G5E°(v.w)  if  and  only  if  for  some  TtePATHSfa).  v  e  D'^  and  f" 
transforms  v  into  w.  Thus  GSE^  gives  a  complete  description  of  all  of  the  paths  in 
PATlLS(a). 

Unfonunately,  and  not  surprisingly,  there  are  serious  obstacles  to  automating  global 

symbolic  evaluation    If  P.\THS(a)  =  (Tt] n„)  is  finite  then  it  is  fat  least  in  principle) 

possible  to  automatically  derive  GSE^.  In  this  case  GSE°  will  be  of  the  form 
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(/>C"'  ^  COMP""^ )  &...&  (PC""  ->  COMP''") 

The  length  of  this  expression  may  be  exponential  in  the  length  of  a. 

The  situation  is  even  worse  if  a  contains  loop  symbols  or  stars.  In  this  case, 
PATHS(a)  is  infinite  and  it  may  be  impossible  to  automatically  derive  GSE^.  Global 
symbolic  evaluation  methods  described  in  the  literature  attempt  to  derive  and  solve 
recurrence  relations  describing  the  effects  of  loops.  However,  it  is  not  always  possible  to 
do  so.  Nested  loops  and  selection  statements  within  loops  are  particularly  problematic. 

Consider  the  path  expression  a=6-7-pg-13-6-14  to  which  we  applied  partial  sym- 
bolic evaluation  above.  Manually  analyzing  the  set  of  paths  represented  by  a  shows  that 

GS£°(v,w)= 

in^>2)  &  switch'^  & 

((n^  >  2)  ^  i(a°{n^-]]<a^[n%  &  (af[n^]=a^{n%)) 

&  (Vj)((l<j<«°)^(a/[«0]>a-^[j])) 

This  expression  gives  more  detail  about  the  effect  of  the  paths  through  the  loop  headed 
by  node  6.  Unfortunately,  however,  the  global  symbolic  evaluation  methods  described  in 
the  literature  are  not  powerful  enough  to  derive  GSE°^  for  this  path  expression. 

The  following  facts  about  the  relationship  between  the  symbolic  execution,  partial 
symbolic  evaluation,  and  global  symbolic  evaluation  are  easy  to  verify: 

1.  For  each  K€  PATHS  (a).  Sf'' =>P5£" 

2.  CS£"=>/'5£" 

3.  If  PATHS(a)=  { n }  then  5E"  =  PSE"  =  GSE"" 
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When  PATHS(a)  consists  of  a  single  path,  the  three  methods  are  equivalent.  Global 
symbolic  evaluation  is  more  powerful  than  partial  symbolic  evaluation,  but  partial  sym- 
bolic evaluation  is  much  more  practical,  being  essentially  no  more  difficult  than  symbolic 
execution. 

We  conclude  this  section  with  some  comments  on  procedure  calls.  Paths  containing 
procedure  calls  are  problematic  for  symbolic  execution  systems.  (Note  that  since  we  are 
assuming  that  functions  cannot  take  var  parameters  or  have  side  effects,  they  do  not 
present  the  same  problems  as  procedure  calls  do.)  Let  7i  be  a  path  consisting  of  a  single 
node  whose  only  statement  is  a  procedure  call.  One  would  like  5£"(v,w)  to  hold  if  and 
only  if  when  the  procedure  is  called  with  initial  state  v  the  procedure  terminates  and  the 
state  upon  returning  from  the  call  is  w.  However,  since  there  is  no  way  to  determine  a 
priori  the  path  which  will  be  followed  through  the  procedure,  in  order  to  derive  S£"(v,w) 
one  would  have  to  employ  global  symbolic  evaluation. 

One  approach  which  has  been  suggested  for  symbolically  executing  paths  with  pro- 
cedure calls  is  the  replacement  of  the  node  corresponding  to  the  procedure  call  by  a  par- 
ticular path  through  the  called  procedure.  This  is  equivalent  to  disallowing  procedure 
calls,  and  seems  overly  restrictive.  Another  approach  is  to  replace  the  procedure  call  by 
an  executable  specification  for  the  procedure.  This  seems  more  reasonable,  but  it 
assumes  the  availability  of  an  executable  specification  for  each  procedure  and  assumes 
that  the  procedure  correctly  implements  that  specification. 

For  partial  symbolic  evaluation,  on  the  other  hand,  there  is  an  easy  way  to  treat 
nodes  having  procedure  calls.  The  approach  is  consistent  with  the  treatment  of  other 
symbols  corresponding  to  more  than  one  path.  If  a  is  a  node  whose  only  statement  is  a 
procedure  call  then  wc  simply  define 
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,  H'  ^UNIQUE  NEW  SYMBOL      if  /  €{i^  ,  ...Jh) 
PCOMPfi\,w)=]  -         -  ,        . 

',  =  V,  otherwise 

where  V;j  ,...,V,^  are  the  variables  which  can  potentially  be  defined  by  the  procedure  call 
(actual  parameters  corresponding  to  var  formal  parameter  and  global  variables  which  are 
modified  in  the  called  procedure). 

5.  Precise  Description  of  the  Path  Expression  Method 

In  this  section  we  give  a  precise  description  of  the  path  expression  method.  Let 
(d,u,v)  be  definition-use  association.  The  path  expression  method  will  attempt  to  deter- 
mine that  (d,u,v)  is  unexecutable,  that  is,  that  there  is  no  executable  path  from  the 
procedure's  entry  to  its  exit  which  has  as  a  subpath  a  definition-clear  path  with  respect  to 
V  from  d  to  u. 

We  can  assume,  without  loss  of  generality,  that  u  is  a  node,  rather  than  an  edge. 
This  is  because  we  have  defined  structured  flow  graphs  in  such  a  way  as  to  insure  that  if 
(i,j)  is  an  edge  having  a  p-use  of  some  variable,  then  there  is  no  edge  (i,k)  for  k?tj.  Thus, 
if  (d,(i,j),v)  is  a  definition-p-use  association,  the  paths  from  d  to  (ij)  are  precisely  the 
paths  from  d  to  j. 

A  candidate  is  a  path  from  the  procedure's  entry  to  its  exit  which  has  a  subpath 
going  from  d  to  u.  A  candidate  is  viable  if  it  is  executable  and  if  it  has  a  definition  clear 
subpath  with  respect  to  v  from  d  to  u.  Thus,  the  definition-use  association  is  executable 
if  and  only  if  the  set  of  viable  candidates  is  non-empty. 

We  will  use  a  path  expression  ^  to  describe  the  set  of  candidates.  As  the  path 
expression  method  progresses,  we  will  attempt  to  eliminate  from  consideration  subsets 
which  consist  only  of  candidates  which  are  not  viable.  When  such  a  subset  is  eliminated. 
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we  will  rewrite  ^  accordingly.  If  we  can  eliminate  all  of  the  candidates  from  considera- 
tion then  the  association  is  unexecutable.  In  this  case,  ^  will  be  equivalent  to  the  path 
expression  0. 

Let  a  be  a  path  expression  which  begins  with  n^  or  p^  and  let  (i,j)  be  an  edge  in  G. 
We  will  sometimes  prepend  a  with  a  symbol  n,"'''  or  the  symbol  p,"^.  This  will  indicate 
that  we  are  including  the  branch  condition  on  edge  (i,j)  in  the  partial  symbolic  evaluation 
of  a.  Similarly,  when  a  ends  with  n^  or  p^  and  (j,i)  is  an  edge  in  G  we  will  sometimes 
append  a  with  n,*^'  or  p,'"".  This  will  indicate  that  we  are  including  the  branch  condi- 
tion on  edge  (j,i)  in  the  partial  symbolic  evaluation  of  a. 

Let  T  be  the  tree  corresponding  to  a  path  expression  representing  the  set  of 
paths  from  the  entry  node  to  the  bottom  of  node  d.  Let  T  be  the  tree  corresponding  to 
a  path  expression  representing  the  set  of  paths  which  go  from  the  bottom  of  node  d  to  the 
top  of  node  u  without  passing  through  d  or  u.  Let  T  be  the  tree  corresponding  to  a 
path  expression  representing  the  set  of  paths  from  the  top  of  node  u  to  the  exit  node.  Let 
jsg  7a  •  .y-a  •  .ya  •  7hen  every  candidate  is  an  element  of  PATHS(^)  of  the  form 
Ttpo  where  ne  PATHS(a''^),  p€  PATHS(a'^"),  and  oe  PATHSia"").  The  viable  candi- 
dates are  executable  paths  of  this  form  where  p  is  a  definition-clear  path  with  respect  to 
V.  Throughout  this  section,  T^  will  stand  for  the  tree  representing  the  current  set  of  can- 
didates and  T  will  stand  for  the  current  subtree  corresponding  to  the  definition-clear 
subpaihs  with  respect  to  v  from  d  to  u. 

W'c  begin  this  section  by  defining  a  set  of  rules  which  will  be  used  to  re\>.Tite  path 
expressions.  W  c  will  then  define  some  notions  which  will  be  used  for  showing  that  cer- 
tain sets  of  paths  contain  no  executable  paths.  W'c  then  present  two  path-cxprcssion- 
mcihod  algorithms.    The  first,  PEMl,  is  a  very  general  non-dciemiinisiic  procedure 


which  is  not  guaranteed  to  terminate.  The  second,  PEM2,  is  an  efficient  but  less  general 
algorithm,  in  which  most  of  the  non-determinism  has  been  removed.  We  conclude  with  a 
discussion  of  the  implementation  of  PEM2  and  two  examples. 

We  now  define  a  set  of  rewrite  rules  which  will  be  used  in  the  path  expression 
method: 

RWl:Replace  a  subtree  7""  of  7^  by  T^ . 
RW2:Replace  a  subtree  7"  of  7^  by  7"'  where  oc=a'. 

Let  7°  be  a  subtree  of  7^.  If  7^  is  the  tree  obtained  by  applying  RW2  to  7",  i.e. 
by  replacing  the  subtree  7"  by  7"',  then  PATHS(^0=PATHS(^).  So  application  of 
RW2  results  in  a  different  representation  of  the  same  set  of  candidate  paths.  On  the  other 
hand,  if  7^'  is  the  tree  obtained  by  applying  RWl  to  7",  then  PATHS (^OcPATHS(^). 
Before  we  apply  rule  RWl,  we  will  have  to  insure  that  none  of  the  paths  in 
PATHS(^)-PATHS(^')  is  viable. 

Let  7"  be  a  subtree  of  7^.  We  will  say  that  7°  is  RWl -safe  if  and  only  if  no  viable 
candidate  in  PATHS(^)  has  a  7"-subpath. 

Let  7"  be  an  RWl -safe  subtree  of  7^.  Let  7''  be  the  tree  obtained  from  7^  by 
applying  RWl  to  7°.  Then  all  of  the  paths  in  PATHS(^)-PATHS(^')  have  7°-subpaths, 
hence  are  not  viable.  So  we  can  safely  replace  ^  by  t,\  thereby  eliminating  from  con- 
sideration some  of  the  non-viable  candidates. 

We  now  discuss  methods  for  trying  to  determine  whether  or  not  a  subtree  is  RWl- 
safe.  The  simplest  method  of  showing  a  subtree  to  be  RW-safe  uses  only  syntactic  infor- 
mation. If  7"  is  a  subtree  of  7  and  ve  ALWAYS(a)  then  no  path  having  a  7"- 
subpath  is  viable,  so  7"  is  RWl -safe. 
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Other  methcxls  of  showing  a  subtree  to  be  RWl-safe  take  into  account  the  semantics 
of  the  procedure.  Let  E  be  a  predicate  whose  free  variables  are  some  but  not  necessarily 
all  of  the  program  variables  and  let  a  be  a  path  expression.  We  will  say  that  E  is  a- 
satisfiable  if  and  only  if  there  exist  states  v  and  w  such  that  GS£"(v,w)  &  E(w).  Thus,  E 
is  a-satisfiable  if  and  only  if  for  some  path  rce  PATHS(a)  and  some  state  v,  veD"  and  /" 
transforms  v  into  w  and  w  satisfies  E.  In  particular,  if  PATHS(a)  is  the  set  of  paths  from 
procedure  entry  to  point  p  then  E  is  a-satisfiable  if  and  only  if  when  control  reaches  point 
p,  the  program  state  satisfies  predicate  E.  We  will  say  that  E  is  a-unsaiisfiable  if  and 
only  if  E  is  not  a-satisfiable. 

In  the  path  expression  method  we  will  be  interested  in  showing  that  certain  predi- 
cates are  a-unsatisfiable  for  cenain  path  expressions  a.  Since  it  is  not  in  general  possible 
to  derive  GSE"^  we  will  approximate  GSE^  with  a  weaker  predicate.  Let  C(v,w)  be  any 
predicate  such  that  C5£°=>C.  Clearly,  if  (v,\v)  satisfies  G5£"(v,w)  &  E(w)  then  (v,w) 
satisfies  C(v,w)  &  E(w).  Thus  if  C(v,w)  &  E(w)  is  unsatisfiable  then  E  is  a-unsatisfiable. 
In  particular,  if /'S£"(v,w)  &  E(w)  is  unsatisfiable  then  E  is  a-unsatisfiable. 

We  will  say  that  E  is  (x-necessary  if  and  only  if  (        V       PC'')=>E.  That  is,  if  v 

ne PATHS  (a) 

is  in  the  path  domain  of  any  element  of  PATHS(a)  then  v  satisfies  E.  For  example,  the 
predicate  PPC(a)  is  a-necessary.  It  follows  from  the  definitions  that  if  a  predicate 
E=FALSE  is  a-necessary  then  all  of  the  paths  in  PATHS(a)  are  unexecutable,  hence 
none  of  them  are  subpaths  of  viable  paths,  so  7"  is  R W 1  -safe. 

Note  that  the  question  of  whether  or  not  E  is  a-satisfiablc  pertains  to  program  states 
which  can  occur  after  executing  a  path  from  PATHS(a)  while  the  question  of  nn  heiher  or 
not  E  is  a-necessary  pertains  to  program  slates  which  must  occur  before  executing  a  path 
from  PATHS(a). 
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Let  S  be  a  set  of  program  variables  and  let  7""  be  a  subtree  of  T^.  We  will  say  that 
S  has  the  T°- -Must-Redefine  property  if  and  only  if  at  least  one  element  VeS  has  a 
definition  on  each  r°-subpath  of  a  viable  path  in  PATHS(^).  Thus,  if  S  satisfies  the  7"- 
Must-Redefine  propeny  and  S£NEVER(a)  then  no  viable  element  of  PATHS(^)  has  an 
r"-subpath.  In  other  words,  if  S  satisfies  the  7" -Must-Redefine  propeny  and 
ScNEVER(a)  then  7"  is  RWl-safe. 

Let  a=P-Y-5  be  a  sub-path-expression  of  ^,  where  p,  y,  and  5  are  sub-path- 
expressions  of  a.  Let  E(V],...,V„)  be  a  predicate  which  is  both  5-necessary  and  P- 
unsatisfiable.  Let  VARIABLES(E)  denote  the  set  of  program  variables  which  actually 
occur  in  E.  Let  ti  be  an  executable  element  of  PATHS(a).  Then  7i=pox  where 
pePATHS(p),  aePATHS(Y),  and  xePATHS(5)  are  all  executable  paths.  Let  veD".  f 
transforms  v  into  a  state  v'  which  does  not  satisfy  E  (since  E  is  P-unsatisfiable).  f^ 
transforms  v'  into  a  state  w  which  does  satisfy  E  (since  wsD'^  and  E  is  5-necessary).  So 
at  least  one  of  the  variables  in  VARIABLES(E)  must  change  value  while  path  a  is  being 
executed.  Thus,  at  least  one  of  the  variables  in  VARIABLES (E)  must  have  a  definition 
on  path  o. 

Combining  the  above  definitions  and  observations  gives 
LEMMA  5.5.1: 

Let  7"  be  a  subtree  of  7^  where  7"g7P-7^-7^   Let  E(Vi V„)  be  a  predicate 

which   is  both  6-necessar\'  and   P-unsatisfiable.    Then  the  set  VARIABLES(E) 

satisfies  the  7^-Must-Redefine  property.  If  VARIABLES(E)cNEVER(Y)  then  7^  is 

RWl-safe.H 

We  will  associate  with  each  subtree  7"  of  7^  a  collection  MUST-REDEFINE(7°) 
of  sets  of  variables,  such  that  each  Se  MUST-REDEFINE(7")  satisfies  the  7°-Must- 
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Redefine  Propeny.  Initially,  each  of  the  collections  MUST-REDEFINEfT")  will  be 
empty.  As  the  path  expression  method  progresses,  we  will  use  Lemma  5.5.1  to  derive 
elements  of  the  collections. 

The  following  observations  justify  associating  the  collection  MUST- 
REDEHNECT"")  with  the  root  of  7".  Let  T  be  a  subtree  of  7°  where  7°  is  a  subtree  of 
7^  and  let  S  be  a  set  of  program  variables  which  satisfies  the  7"-MUST-REDEFlNE 
propeny.  Let  7  and  T^  be  the  trees  obtained  by  applying  RW2  to  T.  Then  since 
PATHS(^)=PATHS(^')  and  PATHS(a)=PATHS(aO,  S  satisfies  the  7"'-Must-Redefine 
propeny  (with  respect  to  ^')-  Similarly,  if  7*^  is  RWl-safe  and  if  7"  and  7^  are  the 
trees  obtained  by  applying  RWl  to  T  then  S  satisfies  the  7  -Must-Redefine  propeny 
(with  respect  to  T^  ).  Thus  if  S  belongs  to  the  collection  MUST-REDEFINE(7")  associ- 
ated with  the  root  of  7°,  and  if  RW2  is  applied  to  a  subtree  of  7"  or  RWl  is  applied  to  a 
RWl-safe  subtree  of  7"  then  S  still  satisfies  the  7"-Must-Redefine  propeny  for  the 
transformed  tree. 

We  now  present  a  non-deterministic  procedure  which  attempts  to  determine 
whether  an  association  is  unexecutable.  It  returns  "UNEXECUTABLE"  only  if  the  asso- 
ciation actually  is  unexecutable.  We  will  then  discuss  a  strategy  for  eliminating  the  non- 
determinism  and  increasing  the  efficiency  of  the  algorithm.  An  algorithm  for  computing 
a  path  expression  representing  the  set  of  paths  from  a  node  s  to  a  node  or  edge  t  is  given 
in  Appendix  I. 
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Algorithm  5.5.1: 

function  PEMl(d,UjV'^:  (POSSIBLY-EXECUTABLE.UNEXECUTABLE); 

initialize  T^  and  7"°  ' 

for  each  subtree  7"  of  7"^  do 

begin 

MUST-REDEFINE(7P):=0; 

RWl-safe(rP):=  false 
end; 

done:=false; 
while  not  done  do 
begin 

if  ^=0  then  return(UNEXECUTABLE); 

else  select 

Dtrue  =>      Apply  RWl  to  an  RWl-safe  subtree  7"  of  7^ 

ntrue=>      Apply  RW2  to  a  subtree  7"  of  7^ 

D  true  =>      UPDATE  MUST-REDEFINE: 
choose  a  subtree  T^  of  7^; 
let  a=P-Y-5  be  the  7'>^-reduction  of  7^; 
find  a  5-necessary  predicate  E; 
if  E  is  P-unsatisfiable  then 

MUST-REDEFINE(7"0:=MUST-REDEFINE(7^)u{VARIABLES(E)}; 

Dtrue  =>     UPDATE  RWl-safe: 

Choose  a  subtree  7"  of  7"^'"  such  that  ve  ALWAYS(a) 

or  choose  a  subtree  7"  of  7^  such  that  MUST-REDEnNE(7°)cNEVER(a) 

or  choose  a  subtree  7°  of  7^  such  that  FALSE  is  a-necessary; 

RWl-safe(7"):=true; 

ntrue=>      done:=true;  {give  up) 

end  select; 
end; 
return(POSSIBLY-EXECUTABLE); 

As  indicated  above,  application  of  the  rewrite  rule  RWl  to  RWl-safe  states  does 
not  eliminate  any  viable  candidates.  Thus  every  time  the  top  of  the  while  loop  is  reached 
all  of  the  viable  candidates  belong  to  PATHS(^).  If  ^=0  then  PATHS(£)  is  empty,  so 
there  are  no  viable  candidates.  In  this  case  the  procedure  terminates,  returning  "UNEXE- 
CUTABLE".   However,  the  algorithm  is  not  guaranteed  to  terminate.   If  at  any  point  the 
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"done  :=  true"  selection  is  executed  the  algorithm  will  terminate  returning  "POSSIBLY- 
EXECUTABLE".  In  this  case  PATHS(^)  is  a  superset  of  the  set  of  viable  paths. 

It  would  be  possible  to  eliminate  the  non-determinism  in  PEMl,  by  selecting  a  par- 
ticular order  in  which  to  perform  the  operations  and  to  restrict  the  application  of  RW2  so 
as  to  avoid  the  generation  of  trees  which  had  already  been  generated  at  a  previous  step. 
However,  even  with  these  restrictions  it  would  be  possible  for  ^  to  become  arbitrarily 
long,  for  example  by  using  RW2  to  repeatedly  replace  a  sub-path-expression  a*  by 

In  the  function  PEM2,  we  will  sacrifice  some  of  the  generality  of  PEMl,  for  the 
sake  of  efficiency.  We  will  do  this  by  only  allowing  a  restricted  form  of  the  rewrite  rule 
RW2: 

RW2': 

Replace  a  subtree  7"  of  7''  by  7°  where  a=p,,  a'=(p*n,),  no  node  symbol  or 
loop  symbol  occurs  more  than  once  in  P,  and  a'=a.  (Thus  a'  represents  the  set  of 
paths  through  the  loop  headed  by  node  i.) 

RW2'  transforms  ^  into  an  equivalent  path  expression  which  includes  more  details  about 
the  body  of  some  loop. 

Function  PE.\12  uses  rewrite  rules  RWl  and  RW2',  and  thus  concentrates  attention 
on  loops  in  the  attempt  to  eliminate  non-viable  candidates.  PEM2  will  also  focus  attcn- 
tion  on  the  subtree  7  ,  applying  the  rewrite  rules  only  to  subtrees  of  this  tree.  PEM2 
also  eliminates  (most  oO  the  non-determinism  of  PEMl  by  performing  the  operations  in 
a  pre-determined  order. 
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Algorithm  5.5.2 

function  PEM2(d,u,y):  (POSSIBLY-EXECUTABLE,UNEXECUTABLE); 


lunction  PtMZca,UA'): 
Initialize  T^  and  7       ; 


for  each  subtree  7"  of  T^  do  MUST-REDEFINE(r"):=0; 
while  T        has  a  subtree  of  the  form  T^'  do 
begin 

{ 1 }     if  ^=0  then  retum(UNEXECUTABLE) 
else 

begin  ^ 

{2}  choose  a  subtree  7P'  of  T"'^'" ;  ^ 

{3)  let  a  be  the  7P' -reduction  of  7^; 

then  a=p'-p,Y'  where  p'  and  /  are  sub-path-expressions; 
let  (3  denote  P'-p,'"^  and  let  ydenote  p,'"^  Y; 
{4}  derive  y-necessary  predicates  £  i  ,...,£4; 

{5)  if£]&...&£;t=FALSEthen  (FALSE  is  y-necessary)  apply  RWl  to  7" 

else  begin 

for  i  :=  1  to  k  do 
begin 
{6}  if  £,  is  P-unsatisfiable 

{7)  then  MUST-REDEFINE(7^) 

:=MUST-REDEFINE(7Y)u{VARIABLES(E)); 
end;  (for) 
end;  (else) 
{8)  Apply  RW2' to  7P'; 

{let  7    denote  the  subtree  which  replaces  T^  ] 
[9]  Apply  RWl  to  any  subtree  7^  of  7*  such  that  ve  ALWAYS(e); 

{10}  Update  sets  ALWAYS(0,  SOMETIMES(0,  NEVER(C)  for  all 

sub-path-expressions  C,  of  ^. 
{11}  Apply  RWl  to  any  subtree  7^  of  7^  such  that 

(3SeMUST-REDEFINE(7^)  s.t.  ScNEVER(e)). 
end;  {else] 
end;  (while) 

if  PPC(^)=FALSE  then  (FALSE  is  ^-necessary)  return(UNEXECUTABLE) 
elsereturn(POSSIBLY-EXECUTABLE); 

Each  time  the  top  of  the  while  loop  in  PEM2  is  reached,  the  set  of  viable  candidates 
is  contained  in  PATHS(^).  Thus,  PEM2  returns  "UNEXECUTABLE"  only  if  the 
definition-use  association  (d,u,v)  is  unexecutable.  The  algorithm  terminates  either  when 
b,=0  or  when  there  are  no  more  loop  symbols  to  which  to  apply  RW2'.  If  PEM2  returns 
"POSSIBLY-EXECUTABLE"  then  the  final  value  of  S,  represents  a  super-set  of  the  set 
of  viable  candidates. 
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We  will  next  address  a  number  of  implementation  issues.  We  assume  the  availabil- 
ity of  a  partial  symbolic  evaluator  and  a  theorem  prover. 

In  line  1  the  algorithm  has  to  determine  whether  ^  is  equivalent  to  0.  One  way  to 
make  it  easier  to  do  this  is  by  performing  the  following  simplifications  whenever  T''  is 
modified: 

1.  Replace  subtrees  of  the  form  T^T^  or  T^-T^  by  T^. 

2.  Replace  subtrees  of  the  form  T^[^T^  orT^y^T^  by  T^. 

3.  Replace  subtrees  of  the  form  r^T^  or  r^T^  by  7^. 

4.  Replace  subtrees  of  the  form  T^*  by  T^. 

5.  Replace  subtrees  of  the  form  7   *  by  7   . 

These  simplifications  can  be  performed  in  a  single  post-order  traversal  of  the  tree  7''. 
After  performing  these  simplifications,  ^=0  if  and  only  if  ^=0. 

Each  time  execution  reaches  line  2  the  algorithm  chooses  a  subtree,  T^\  on  which 
to  concentrate.  This  non-determinism  can  be  eliminated  by  specifying  a  particular  order 
in  which  loop  symbols  are  to  be  processed.  For  example  it  could  be  stipulated  that  loop 
symbols  are  to  be  processed  in  increasing  order  of  depth  of  nesting  and  from  left  to  right 
within  a  given  depth  of  nesting.  More  work  is  needed  to  determine  how  the  order  in 
which  loop  symbols  are  processed  effects  the  efficiency  and  power  of  the  algorithm. 

In  line  4,  the  algorithm  derives  a  set  [E\ Ei  ]  of  Y-necessar>'  predicates.  In  order 

to  derive  all  of  the  y  necessary  predicates,  we  would  have  to  employ  global  symbolic 
evaluation  of  y  However,  we  can  derive  some  of  them  by  using  partial  symbolic  evalua- 
tion. Recall  that  the  the  predicate  PPC^  is  y-ncccssary'  So  we  can  derive  this  family  by 
writing  PPC^  as  a  conjunct  of  clauses,  PPC^Cv^s  £,(v)&...&£a(v). 
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Similarly,  in  line  6,  the  algorithm  must  determine  whether  £,  is  P-unsatisfiable,  i.e. 
whether  G5£^(v,w)&£,(w)  is  unsatisfiable.  To  get  a  precise  answer  to  this  question  we 
would  need  to  perform  global  symbolic  evaluation  to  derive  the  predicate  and  to  employ 
a  sufficiently  powerful  theorem  prover  to  check  its  satisfiability.  It  is  not  in  general  pos- 
sible either  to  perform  global  symbolic  evaluation  or  to  determine  whether  an  arbitrary 
predicate  is  satisfiable.  However,  we  can  perform  the  following  approximation.  As 
pointed  out  earlier,  if  PSE^{\,w)&Ei{w)  is  unsatisfiable  then  £,  is  (i-unsatisfiable.  If  the 
theorem  prover  can  show  that  /'S£^(v,w)&£,(w)  is  unsatisfiable  then  we  know  that  £,■  is 
p-unsatisfiable.  If  the  theorem  prover  determines  that  /'S£^(v,w)&£,(w)  is  satisfiable  or 
if  it  cannot  reach  a  conclusion  (within  some  predetermined  amount  of  time)  then  PEM2 
can  make  the  conservative  assumption  that  £,  is  not  P-unsatisfiable  (although  in  fact  it 
might  be.) 

The  sets  ALWAYS,  SOMETIMES,  and  NEVER  can  easily  be  updated  (line  10) 
whenever  7''  is  modified  by  performing  a  single  post-order  traversal  of  the  tree,  applying 
the  definitions  of  the  sets.  At  the  same  time,  RWl  can  be  applied  to  subtrees  which  have 
become  RWl -safe  by  virtue  of  the  updates  (line  11). 

Note  that  given  a  partial  symbolic  evaluator,  a  theorem  prover,  and  a  strategy  for 
choosing  the  subtree  7^'  on  which  to  concentrate,  PEM2  becomes  a  fully  automatic  pro- 
cedure. We  now  show  that  the  time-complexity  of  PEM2  is  polynomial  in  the  length  of 
the  subject  program. 

The  algorithm  for  initializing  7"        produces  a  path  expression  in  which  each  loop 

d,u 

symbol  appears  at  most  twice.  Let  p,  be  a  loop  symbol  appearing  in  T"  '  .  The  loop 
symbols  which  are  generated  by  repeatedly  applying  RW2'  to  7"^  are  all  distinct  from 
one  another.  So  the  total  number  of  passes  through  the  while  loop  is  less  than  or  equal  to 
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IL "  where  L  is  the  number  of  loops  in  the  procedure. 

We  can  also  guarantee  that  at  any  point  in  the  algorithm,  the  length  of  ^  is  O(N^) 
where  N  is  the  number  of  nodes  in  the  flow  graph  (see  Appendix  II).  Let  M  be  the 
number  of  characters  in  the  subject  program.  Clearly  N=0(M)  and  L=OfM). 

In  line  1,  PEM2  checks  whether  ^=0.  This  can  be  done  in  O(N^)  time  by  simplify- 
ing T''  during  one  traversal  of  the  tree.  There  are  many  possible  strategies  for  choosing  a 
subtree  T^'  on  which  to  concentrate  in  line  2.  Some  of  these  strategies  take  only  0(1) 
time.  Finding  the  7  "-reduction  of  7S  and  the  corresponding  sub-path-expressions  (3  and 
^  in  step  3  can  be  done  with  one  traversal  of  the  tree,  hence  in  O(N^)  time.  Partial  sym- 
bolic evaluation  of  a  path  expression  yean  be  done  in  time  proponional  to  the  length  of  y 
which  is  O(N^)  so  step  4  can  be  performed  in  0(A'^)  time.  Since  the  number  of  clauses 
in  each  branch  is  0(M),  the  number,  k.  of  predicates  £,  derived  in  step  4  is 
OiN^M)=0(M-). 

Since,  as  discussed  above,  the  theorem  prover  can  be  forced  to  return  after  some 
predetermined  constant  amount  of  time,  each  execution  of  step  5  and  of  6  takes  0(1) 
time.  The  for  loop  is  executed  O(M^)  times,  and  each  execution  takes  00)  time.  Steps 
8,  9,  and  10  can  each  be  performed  in  0(/V')  time.  In  step  1 1  there  are  O(N')  subtrees  to 
examine,  each  of  which  can  potentially  have  OCA/-')  elements  in  its  MUST-REDEFINE 
SCI.  so  step  1 1  takes  0{N^M^)=0(M^)  time. 

Thus  PEM2  runs  in  time  bounded  by  0(L^M^)=0{M^).  The  above  estimate  was 
very  crude,  and  was  strongly  affected  by  the  fact  that  k=0(M').  This  is  not  a  tight  upper 
bound  on  k.  Furthermore,  in  practice,  k  will  generally  be  a  small  number,  and  the  algo- 
rithm will  run  more  efficiently  than  this  worst-case  upper  bound  would  indicate. 
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It  is  also  possible  to  implement  PEM2  in  such  a  way  that  only  the  "relevant"  loops 
need  be  examined.  This  improves  the  run  time  in  practice.  To  do  this,  it  is  necessary  to 
make  conservative  estimates  of  the  sets  ALWAYS(p,),  SOMETIMES(p,),  NEVER(p,) 
as  follows.  Let  the  definitions  oi  ALWAYS',  SOMETIMES',  and  NEVER',  be  analogous 
to  those  of  ALWAYS,  SOMETIMES,  and  NEVER  except  that 
ALWAYS'ip,)=ALVJAYS{n,),  5C>A/£T/M£S'(p,)=VAR_SET;  and  NEVER'ip,)=0. 
Then  for  any  path  expression  a,  ALW AY  S'{a)Q  ALWAYS  (a), 
SC>M£T/MES'(a)2S0METIMES(a),  and  NEVER'ia)^NEVER(a).  Therefore,  any  sub- 
tree which  is  marked  as  being  RWl-safe,  using  the  sets  ALWAYS'  and  NEVER'  will  in 
fact  be  RWl-safe.  Also,  when  performing  panial  symbolic  evaluation,  any  variable 
which  should  be  given  a  unique  new  symbolic  value  will  be  given  one. 

We  next  present  two  examples  of  the  operation  of  PEM2.  In  the  first,  we  re- 
examine the  bubble-son  procedure  presented  in  section  5.2.  In  the  second  we  examine  a 
program  in  which  the  bodies  of  deeply  nested  loops  are  relevant  to  determining  that  an 
association  is  unexecutable.  When  no  ambiguity  results,  we  will  refer  to  applying 
rev.Tiie  rules  to  sub-path-expressions,  rather  than  to  the  corresponding  trees.  Figures 
5.5.1  and  5.5.3  show  how  the  trees  7"''  and  7"  '  evolve  as  the  algorithm  proceeds  in 
Examples  5.5.1  and  5.5.2,  respectively.  In  these  figures,  the  root  of  a  subtree  T  is  anno- 
tated with  a  list  of  the  elements  of  MUST-REDEFINE(T). 

Example  5.5.1 

Consider  again  the  definition-use  association  (2,(6,14),.m7rc/0  from  the  bubble-sort 
program  in  Figure  5.2.2.  Initializing  ^  and  a"^'"  gives  a'''"=2-p3-5-p6-14  and 
e=l  2  p3  5  P6  14  15. 
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We  first  direct  attention  toward  the  loop  symbol  P3.  The  7''' -reduction  of  T^  is  is  ^ 
itself.  p=l-2-p3"'^  and  •y^p3'""-5-p6-14-15.  Panial  symbolic  evaluation  of  y  gives 
/»PC^=£i=0>5).  Panial  symbolic  evaluation  of  p  gives  PCOMP^=^ii=\)  so  fj  is  p- 
unsatisfiable.  We  place  the  element  VAR1ABLES(£,)={(}  in  the  set  MUST- 
REDEnNE(7"P' )•  (See  Figure  5.5.1.a). 

We  next  apply  RW2'  to  7^',  replacing  P3  by  5=(3-4)*-3.  (See  Figure  5.5.1.b). 
There  are  no  opportunities  to  apply  RWl  to  5.  Control  goes  to  the  top  of  the  while  loop 
with  a'^"=2(3-4)*-3-5p(,-14  and  4=l-2(3-4)*-3-5p6l415. 

We  next  concentrate  on  loop  symbol  P6.  The  7^' -reduction  of  T""  is  is  ^  itself. 
(3=l-2(3-4)*-3-5pf,"'^  and  •j^p3'""1415.  Partial  symbolic  evaluation  gives 
PPC'^=ExM{n<2)  or  (not  switch))  and  PCOMP^=>{switch8i{n=5)),  so  £1  is  P- 
unsatisfiable.  The  set  VARIABLES(£i  )={"-^^'''<^'')  is  placed  in  MUST- 
REDEFINE(rP').  We  apply  RW7'.  replacing  p6  by  5=(6-7psl3)*-6.  (See  Figure 
5.5.1.CJ.  Since  5W7';c/i6  ALWAYS((6-7p)il3)j  we  can  apply  RWl  to  this  sub-path- 
expression.  (See  Figure  5.5.1.d).  After  simphfying,  we  have  a'''"=2-  (3-4)*-3-5-614 
where  {«.5w//c/i)e  MUST-REDEFINE(r^).  Since  {n^H/;c/i)cNEVER(6)  we  can  apply 
RWlto6.  (See  Figure  5. 5. l.e). 

Simplifying,  gives  ^=0.  (See  Figure  5.5.1.0-  Control  returns  to  the  top  of  the 
while  loop,  and,  since  there  are  no  remaining  loop  symbols,  exits  from  the  loop.  Since 
PPC^=PPC^ s.?\LSE,  PEM2  returns  "UNEXECUTABLE". 

Similarly,  if  wc  had  concentrated  attention  on  Pf,  first,  PEMZ  would  have  returned 
"UNEXECUTABLE".  In  fact,  in  this  case  it  would  not  have  had  to  apply  RW2'  to  7^'. 
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Example  5.5.2 

We  next  examine  a  somewhat  more  complicated,  though  admittedly  contrived, 
example  of  the  path  expression  method.  Consider  the  following  procedure,  where 
"def(x)"  and  "useW  are  abbreviations  for  some  statements  which  have  a  definition  or 
use,  respectively,  of  the  variable  x.  The  flow  graph  for  this  procedure  is  shown  in  Figure 
5.5.2. 

procedure  example5_5_2; 
var  x,y,z  :  integer; 
function  f(w:integer):  boolean; 
begin  (f) 

end;  {f} 
begin 

def(x); 

for  y  :=  1  to  5  do 

for  z  :=  1  to  5  do 
if  f(z)  then 
begin 

def(x); 
def(y) 
end: 
use(x) 
end; 
Consider  the  definition-use  association  (2,1  l,x).  The  path  expression  method  initializes  ^ 

to  1-2-P3-1M2  and  a'^'"  to  2-p3-ll.  (See  Fig  5.5.3a.) 

We  first  concentrate  on  the  loop  symbol  P3.  The  7^' -reduction  of  T^  is  ^.  (i= 
llpi'^P  and  -pp3^'"-lM2.  E^=PPC'^=(y>5)  and  PCOMP^^(y=0).  £1  is  p- 
unsatisfiable  so  the  element  (y)  is  added  to  MUST-REDEFINE(7"P'). 

Next,  JP'  is  replaced  by  T^  where  5=(3-4p5-10)*-3  is  equivalent  to  P3.  (See  Fig 
5.5.3b.)  Since  {y)  is  not  contained  in  NEVER(5)  and  there  is  no  subexpression  of  5  in 
which  x  is  always  redefined,  it  is  not  possible  to  apply  RWl  at  this  point.  Control  returns 
to  the  top  of  the  while  loop  in  PEM2  with  £=l-2-(3-4-p5-10)*-3-lM2  and  a"^-"  to 
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2(3  4p5  10)*-311. 

We  next  concentrate  on  the  loop  symbol  P5.  The  7 P' -reduction  of  T^  is 
l-2-3-4p5-10-31112.  (i=l-2-3-4p5'^^  and  ■yfcp5'""-10-31M2.  £,=/'/'C^=(z>5)  and 
PCOMP^=>iz=0).  £1  is  p-unsatisfiable  so  the  element  {y}  is  added  to  MUST- 
REDEHNECrP^ ).  Next,  T^'  is  replaced  by  T^  where  5=(6(7|^8)-9)*-5  is  equivalent 
to  P5.  (See  Fig  5.5.3c.)  Since  x  is  always  defined  in  node  8,  RWl  can  be  applied.  (See 
Fig  5.5.3d.)  Now  ^=l-2(3-4(6-7-8-9)*-510)*-31112.  Since  {yleMUST- 
REDEnNT((3-4(6-7-8-9)*-510)*-3)  and  {y}£NTVER((3-4(6-7-8-9)*-510)*-3), 
this  subexpression  is  RWl -safe.  Applying  RWl  gives  ^=r2-0-iri2  (See  Fig  5.5.3e.) 
and  simplifying  gives  ^=0.  So  PEM2  exits  from  the  while  loop  and  returns  "UNEXE- 
CUTABLE".  Note  that  when  (y)  was  added  to  MUST-REDEnNE(rP'),  {y}  was  not 
contained  in  NEVER(p3)  so  it  was  not  possible  to  apply  RWl  at  the  root  R  of  this  sub- 
tree. However,  after  applying  RW2'  to  T^^  then  to  T^^,  and  applying  RWl  to  T^  more 
details  were  known  about  the  viable  paths.  At  this  point  the  subtree  rooted  at  R  became 
RWl -safe. 

6.  Enhancements  and  Limitations 

In  this  section  we  will  discuss  limitations  of  the  path  expression  method  and  pro- 
pose some  enhancements.  These  enhancements  include  the  relaxation  of  some  of  the 
assumptions  about  the  procedure  being  tested,  as  well  as  the  use  of  more  powerful  path 
expression  rewriting  rules  and  symbolic  evaluation  techniques.  We  will  also  briefly  dis- 
cuss other  uses  of  the  path  expression  method.  Throughout  this  section,  when  we  refer  to 
the  "path  expression  method"  we  v.ill  mean  the  implementation  of  function  PEM2  dis- 
cussed in  section  5.5,  unless  otherwise  noted. 
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Since  it  is  undecidable  whether  or  not  a  panicular  definition-use  association  is  exe- 
cutable, it  is  inevitable  that  the  path  expression  method  (even  in  its  more  general  form) 
cannot  identify  all  unexecutable  associations.  Rather  than  trying  to  formally  classify 
those  programs  for  which  the  path  expression  is  guaranteed  to  be  successful,  we  will 
examine  a  situation  in  which  the  path  expression  fails  to  identify  unexecutable  associa- 
tions as  such. 

Example  5.6.1: 

Consider  the  procedure  GeiWord  in  the  following  program: 


program  driver(input, output); 
const  MAX  =  5; 

type   String  =  array[  1  ..MAX]  of  char; 
var     a  :  String; 

i, length  :  integer; 
procedure  GetWord(var  word  :  String;  var  n:  integer); 
( Reads  a  word  and  changes  lower  case  letters  into  caps} 
var     i  :  O..MAX; 

offset :  integer, 
begin 

wTiteln( 'Enter  a  word  of  length  <=  ',MAX:1 ); 

n  :=  0; 

while  ((not  eoln)  and  (n  <  MAX))  do 

begin 

n  :=  n  +  1; 
read(word[n]) 

end; 

offset  :=  ord(  'a')  -  ord('A'); 

for  1  :=  1  to  n  do 

if  ('a'  <=  word(il)  and  (word[i]  <='z') 
then  word[i|  :=  chr(  ord(word[i])  -  offset) 
end,  IGctW'ord) 

begin  (driver) 

GctWordl  a, length); 

for  1  :-  1  to  length  do  wriie(a[ij); 

writcin; 
end 

The  flow  graph  for  this  procedure  is  shown  in  Figure  5.6.1.  Wc  will  now  apply  the  path 


(p 


©n-O; 


read(a[n]); 


^  offset:=ord('a')-ord('A'); 


\1^  i:=  1; 


a'>word[i]  or  'z'<word[i] 


Figure  5.6. 1 
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expression  method  to  the  definition-use  association  (1,8, word).  This  association  arises 
from  the  fact  that  word  is  defined  in  node  1  (since  it  is  a  var  parameter)  and  used  in  node 
8,  and  there  is  a  definition  clear  path  with  respect  to  word  from  2  to  8. 

Initially, 

a'^."=l*'«'.2-p3-5p6-7-8'^^ 

and 

^=l-2p3-5p6-7  810p6lll2. 

We  will  first  concentrate  attention  on  the  loop  symbol  P3.  The  JP' -reduction  of  T"^  is  is 
^itself.  (i=l-2-p3'^^  and">tp3'"^'-5p6-7-p8-910p6-lM2. 

To  find  a  set  of  y-necessary  predicates  we  derive  the  partial  path  condition  of  y: 

E i^PC''=ieoln  OT  n>5). 
To  check  whether  £)  is  (i-unsatisfiable,  we  derive  the  partial  path  computation  of  [3: 

FCOMP^={n=Ql). 
Since  the  predicate  PCOMP^&Ei={n=0)&ieoln  or  n>5)={eoln=TR\JE)  is  satisfiable, 
we  cannot  show  £,  to  be  (J-unsatisfiable,  and  therefore  cannot  add  any  elements  to  the 
collection  MUST-REDEFINE(rP» ). 

The  path  expression  P3  is  equivalent  to  5=(3-4)*-3.  Since  word  is  defined  in  node 
4,  "we  apply  RW'l  to  the  subexpression  4  of  5.  Simplifying  gives  5'=3,  and  applying 
RW2'givesa''"=l''"'  235  p.  7  8'"^and^=I  235  p,-7-810p^  11  12. 

We  next  concentrate  on  the  loop  symbol  p^.  Lei  T^"  denote  the  subtree  correspond- 
ing to  the  occurrence  of  P(,  in  a''".  The  T^' -reduction  of  T^  is  is  ^  iiself. 
[J=i:35pt,"^  and  Y=P6'"''7-810  p^,  11  i: 

PPC^Mi^)&C^<word[i])S:[word\i]^'z).    Let  £,=(/^/i).    rCOMP^=>{i  =  \)&{n=€)) 
so£i  is  p-unsatisfiablc.  So  the  clement  li.n)  is  placed  in  MrST-REDEFINE(7''«). 
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RW2'  is  applied,  replacing  p^  by  the  equivalent  path  expression 
(6-7(8^9)10)*-6  Since  word  is  defined  in  node  8  (or  since  node  8  is  the  target  node) 
RWl  is  applied  to  node  8.  At  this  point,  a'''"=  l'""-2-3-5(6-7-910)*-6-7-8'°'',  and 
^=l-2-3-5-(6-7-910)*-6-7-810p6-lM2.  Since  {i,n)  is  not  a  subset  of 
NEVER((6-7-9-10)*-6),  no  funher  re-writing  can  be  done  at  this  point. 

Since  a''*"  has  no  remaining  loop  symbols,  we  exit  from  the  while  loop  and  com- 
pute PPC^.  Since  PPC^={{n=Q)&.eoln)  is  not  equivalent  to  FALSE,  the  algorithm 
returns  "POSSIBLY-EXECUTABLE". 

However,  the  association  is  in  fact  unexecutable.  This  can  be  seen  by  observing 
that  the  value  of  /  increases  in  the  loop  headed  by  node  6.  Thus,  since  the  predicate 
(/  =  l)&(rt=0)  holds  before  entering  the  loop  and  the  value  of  n  does  not  change  in  the 
loop,  the  exit  condition,  {i>n),  can  never  hold.  So  none  of  the  candidates  are  viable. 

Intuitively,  the  reason  that  the  path  expression  method  fails  to  conclude  that  this 
association  is  unexecutable  is  because  it  takes  into  account  the  fact  that  /  is  redefined  in 
the  loop,  but  fails  to  take  into  account  the  way  in  which  it  is  redefined. 

There  are  several  ways  to  enhance  the  path  expression  method  so  that  unexecutable 
associations  like  this  one  would  be  detected.  One  method  would  be  to  augment  the  pro- 
cedure with  user-supplied  (or  programmer-supplied)  assertions  and  allow  the  partial  sym- 
bolic evaluator  to  use  the  assertions  in  the  obvious  way.  If  the  assertion  "/>!"  were 
added  on  branch  (6,7)  then  partial  symbolic  evaluation  of 

^=1  2-3  5(6  7  9  10)*-6  7-810p6  11  12 
would  yield  PPC^=^{n=0)&  eoln  &ii'>\)&{i'<n)=FALSE  where  /'  is  the  symbolic 
value  of  i  after  executing  the  sub-path  expression  l-2-3-5(6-7-910)*.   Thus  the  algo- 
rithm would  return  "UNEXECUTABLE". 
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Another  way  to  enhance  the  path  expression  method  would  be  to  use  a  more  power- 
ful symbolic  evaluation  technique  than  partial  symbolic  evaluation.  While  global  sym- 
bolic evaluation  in  its  full  generality  is  not  feasible,  some  limited  form  in  which 
recurrence  relations  are  derived  and  solved  for  some  but  not  all  variables  could  be  practi- 
cal. For  example,  the  recurrence  relation  for  variable  /  in  the  loop  headed  by  node  6  is 
simple: 

/°=1 

,'^=/*-i  +  l  fork>0 

where  /*  denotes  the  value  of  /  after  k  iterations  of  the  loop.  Solving  the  recurrence  rela- 
tion yields  /  =k-t-l  for  k>0  which  implies  that  the  value  of  i  is  greater  than  or  equal  to  1 
when  branch  (6,7)  is  executed.  An  enhanced  version  of  the  path  expression  method 
which  employed  symbolic  evaluation  techniques  powerful  enough  to  infer  this  fact 
would  return  "UNEXECUTABLE"  for  this  association. 

Another  approach  to  enhancing  the  path  expression  method  is  to  include  more 
powerful  revmte  rules.  Restricting  the  rewrite  rule  RW2  to  R\V2'  was  a  severe  restric- 
tion, aimed  at  ensuring  the  efficiency  of  the  algorithm.  If  somewhat  less  stringent  restric- 
tions were  placed  on  R\V2,  a  more  powerful  algorithm  would  result.  Consider  for  exam- 
ple the  rewrite  rule 

RW2": 

Replace  a  subtree  7"  of  7"^  by  7"'^  where  a=(Y-5)*Yand  [i=Y  (5y)*. 

If  the  path  expression  method  were  enhanced  to  allow  application  of  R\V2"  as  well  as 
RWl,  and  R\V2'  then  ^=1  2  35  (6-7-9- 10)*  6  7  S  10  p„  II  12  could  be  replaced  by 
V^l  2  .^  >U^  (•>  10  U  -)♦  S  10  p,  11  12  Since  P/T^'hFALSF,  the  path  expression 
method  would  return  "UNEXECUTABLE". 
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More  research  is  needed  to  determine  which  rewrite  rules  and  which  more  powerful 
symbolic  evaluation  techniques  could  be  added  to  the  path  expression  method  without 
unduly  compromising  efficiency. 

We  next  discuss  the  prospects  for  relaxing  some  of  the  assumptions,  made  in  section 
5.1,  about  the  procedure  P  being  tested.  We  will  discuss  the  way  various  strategies  for 
relaxing  assumptions  affect  the  power  of  the  path  expression  heuristic. 

The  first  assumption  about  P  was  that  no  aliasing  occurs  in  P.  There  are  several 
ways  in  which  aliasing  can  arise  in  a  procedure:  Ifx  and  y  are  var  formal  parameters  in  P 
and  P  is  called  with  the  same  actual  parameter  passed  to  x  and  >'  then  x  and  y  are  aliases. 
Similarly,  if  a:  is  a  global  variable  which  is  also  passed  to  the  var  parameter  y  then  x  and 
>"  are  aliases.  If  p  and  q  are  pointer  variables  with  the  same  value  then  p'  and  q'  are 
aliases.  If  a  is  an  array  and  i  and  j  have  the  same  value  then  a  [i  ]  and  a  [j]  are  aliases. 

In  order  for  the  path  expression  method  to  function  correctly  it  is  necessary  that  all 
aliasing  be  known.  U x  and  y  are  aliased  then  any  assignment  to  x  is  also  an  assignment 
to  y.  This  must  be  taken  into  account  both  in  the  symbolic  evaluation  and  in  the  checks 
for  whether  a  variable  is  in  ALWAYS(a),  SOMETIMES  (a),  or  KEVER(a)  for  a  path 
expression  a.  Two  standard  approaches  to  insuring  that  all  aliasing  is  known  are  to  res- 
trict the  kind  of  aliasing  which  can  occur,  and  to  allow  any  kind  of  aliasing  but  make 
conservative  assumptions  about  the  values  of  variables. 

The  later  approach  entails  considering  each  definition  of  a  variable  x  to  also  be  a 
definition  of  variable  >•  if  .x  and  y  can  potentially  be  aliased.  This  is  the  approach  we  have 
taken  to  handling  arrays.  One  could  apply  this  approach  to  other  kinds  of  aliasing  as  fol- 
lows: Replace  NEVER(a)  by  NEVER' (a)  and  SOMETlMES(a)  by  SOMETIMES'{a) 
where  blE\'ER'{a)={x\  neither  x  nor  any  variable  which  is  potentially  aliased  to  x  is 
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defined  on  any  element  of  PATHS(a)),  and  where  SOMEJlMES'{a)=  complement  of 
NEVER'ia.).  In  partial  symbolic  evaluation  of  a  statement  in  which  a  variable  x  is 
defined,  it  would  be  necessary  to  also  note  that  any  variable  which  is  potentially  aliased 
to  X  could  also  change  value.  For  example,  if  >•  is  a  potential  alias  of  j:,  then  after  sym- 
bolically executing  the  statement  "jc  :=  z",  the  new  symbolic  value,  Y  of  y  is  given  by 
Y=Z  or  Y-Y',  where  Y'  is  the  old  symbolic  value  of  y  and  Z  is  the  symbolic  value  of  z. 
If  there  are  a  lot  of  variables  which  are  potential  aliases,  this  can  become  quite  unwieldy. 

A  simpler  approach  to  partial  symbolic  evaluation  in  the  presence  of  aliasing  is  to 
assign  a  unique  new  symbol  to  each  potential  alias  of  x  whenever  an  assignment  to  x  is 
made.  This  is  computationally  simple,  but  could  result  in  too  much  loss  of  information. 
Most  likely,  the  best  method  of  relaxing  the  "no  aliasing"  assumption  would  involve  a 
combination  of  restricting  the  type  of  aliasing  which  could  occur  and  making  conserva- 
tive assumptions  about  the  ways  in  which  values  of  variables  can  change.  Perhaps 
heuristic  approaches  to  determining  whether  variables  which  are  potential  aliases  are 
actually  aliased  would  be  fruitful. 

The  second  assumption  about  P  (no  hidden  side-effects  and  no  functions  with  var 
parameters)  was  also  motivated  by  concern  for  ensuring  the  accuracy  of  the  data  flow 
information  collected  and  the  panial  symbolic  evaluation.  Approaches  to  relaxing  this 
assumption  are  similar  to  those  to  relaxing  the  first  assumption.  By  performing  interpro- 
ccdural  data  flow  analysis  [MUC81,BUR861,  it  is  possible  to  "uncover"  hidden  side 
effects.  In  those  cases  where  it  is  not  possible  to  tell  for  certain  whether  a  called  pro- 
cedure or  function  has  side  effects  it  is  necessary  to  make  conservative  a.ssumptions. 

The  third  assumption  is  that  functions  have  no  \ar  parameters.  This  assumption  is  a 
convenience  which  simplifies  the  symbolic  evaluation  of  function  calls.   It  can  ea.sily  be 
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eliminated. 

The  founh  assumption,  that  P  has  no  repeat-until  statements,  was  included  as  a 
technical  convenience  in  order  to  insure  that  all  loops  in  P's  flow  graph  have  exits  from 
the  top.  One  way  to  relax  this  assumption  would  be  to  have  different  kinds  of  loop  sym- 
bols for  repeat-until  loops  and  for  while  loops  and  to  expand  the  set  of  rewrite  rules 
accordingly.  It  also  seems  likely  that  the  path  expression  method  could  be  generalized  to 
handle  arbitrary  reducible  flow  graphs.  The  concept  of  loop  symbols  representing  the  set 
of  paths  through  a  loop  would  be  replaced  by  "interval  symbols"  representing  the  paths 
through  an  interval.  For  arbitrary  flow  graphs,  some  variant  of  PEl  could  probably  be 
developed.  Such  an  algorithm  would  use  regular  expressions  without  loop  symbols,  and 
would  initialize  the  path  expression  ^  using  the  algorithm  in  [TARS IB].  More  investiga- 
tion of  these  ideas  is  needed. 

Up  to  this  point  we  have  focussed  on  the  use  of  the  path  expression  method  for 
attempting  to  determine  that  a  given  definition-use  association  is  unexecutable.  This  has 
primarily  been  motivated  by  the  fact  that  if  we  show  that  all  of  the  definition-use  associa- 
tions required  by  a  data  flow  testing  criterion  C  but  not  covered  by  a  test  set  T  are  unexe- 
cutable then  we  have  shown  that  T  is  C*-adequate.  We  will  now  briefly  discuss  two 
other  applications  of  the  path  expression  method. 

The  path  expression  method  can  also  be  used  as  an  aid  for  test  data  generation. 
Suppose  (d,u,v)  is  an  executable  definition  use  association.  If  we  apply  the  path  expres- 
sion method  to  (d,u,v)  it  will  return  "Possibly  Executable".  The  final  value  of  ^  will  be  a 
path  expression  representing  the  final  set  of  candidates.  This  may  be  a  proper  subset  of 
the  initial  set  of  candidates.  By  panially  symbolically  executing  the  path  expression  ^ 
we  obtain  the  ^-necessary  predicate  PPC^.  Since  any  test  element  which  covers  (d,u,v) 
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must  satisfy  PPC^,  we  need  only  consider  such  elements  as  potential  test  cases.  Note 
however,  that  while  PPC^  is  a  necessary  condition  for  a  test  case  to  cover  (d,u,v)  it  is  not 
sufficient. 

The  path  expression  method  may  also  be  useful  in  compiler  optimization.  Many  of 
the  code  transformations  performed  during  optimization  involve  finding  the  definitions  of 
a  variable  which  reach  a  particular  point  p,  that  is,  definitions  d  of  variable  v  such  that 
there  is  a  definition  clear  path  with  respect  to  v  from  d  to  p.  Compilers  generally  make 
the  conservative  assumption  that  all  definitions  which  reach  a  point  are  relevant.  How- 
ever, if  the  only  definition  clear  paths  with  respect  to  v  from  d  to  p  are  unexecutable,  then 
the  value  w  hich  is  assigned  to  v  at  d  cannot  effect  the  value  of  v  at  p.  In  our  notation,  if 
the  definition-use  association  (d,p,v)  is  unexecutable,  then  the  optimizing  compiler  can 
safely  remove  d  from  the  set  of  definitions  of  v  which  reach  p. 

Consider,  for  example,  constant  propagation.   Suppose  that  definitions  d\ di^  of 

variable  v  all  reach  point  p.  Suppose  that  at  dj ^a,  the  value  assigned  to  v  is  zero,  but 

ax  d]  it  is  non-zero.  If  the  definition-use  association  {d\,p,\')  can  be  shown  by  the  path 
expression  method  to  bo  unexecutable,  then  whenever  control  reaches  point  p,  the  value 
of  v  is  guaranteed  to  be  zero.  Thus  any  use  of  v  in  the  statement  at  point  p  can  be 
replaced  by  the  constant  zero.  Similarly,  the  path  expression  method  could  be  used  to 
increase  the  accuracy  of  copy  propagation,  live-variable  analysis,  detection  of  induction 
variables,  detection  of  parallelism,  etc. 

More  work  is  needed  to  determine  when,  whether,  and  to  what  extent  the  cost  of 
using  the  path  expression  method  to  eliminate  "unexecutable"  reaching  definitions  is 
justified  h\  the  improvement  in  the  compiled  code. 
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CHAPTER6: 
Conclusion 

1.  Summary 

In  this  thesis  we  have  investigated  two  families  of  test  data  adequacy  criteria,  the 
data  flow  testing  criteria  and  the  feasible  data  flow  testing  criteria.  We  have  used  the 
inclusion  relation  to  compare  these  criteria  to  one  another  and  to  statement,  branch  and 
path  testing;  have  explored  the  extent  to  which  each  criterion  satisfies  Weyuker's  axioms 
for  a  good  adequacy  criterion;  and  have  examined  prospects  for  building  testing  tools 
based  on  the  criteria.  We  have  described  the  design  and  implementation  of  a  tool, 
ASSET,  which  is  based  on  data  flow  testing,  and  have  described  a  heuristic,  the  path 
expression  method,  which,  in  conjunction  with  ASSET  partially  automates  feasible  data 
flow  testing. 

The  data  flow  testing  criteria  are  based  purely  on  the  syntax  of  the  program  being 
tested.  A  definition  use  association  is  either 

1)  a  triple  (d,u,v),  where  d  is  a  node  having  a  definition  of  variable  v,  u  is  a  node  hav- 
ing a  use  of  V,  and  there  is  a  definition-clear  path  with  respect  to  v  from  d  to  u.  or 

2)  a  du-path  with  respect  to  v. 

The  criteria  require  the  test  data  to  exercise  cenain  subsets  of  the  set  of  definition-use 
associations.  These  criteria  were  originally  defined  by  Rapps  and  Weyukcr  to  apply  to 
programs  written  in  a  simple  language.  We  have  extended  the  definitions  to  apply  to 
subprograms  written  in  a  large  subset  of  Pascal.  This  has  been  done  in  such  a  way  as  to 
preserve  the  inclusion  relationships  among  the  criteria. 
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The  data  flow  testing  criteria  fail  to  satisfy  several  of  Weyuker's  axioms  for  a 
"good"  adequacy  criterion.  In  particular,  they  fail  to  satisfy  the  applicability  axiom, 
which  requires  that  an  adequate  test  set  exist  for  each  subprogram.  Furthermore,  it  is 
undecidable  whether  an  adequate  test  set  exists  for  a  given  subprogram.  This  is  not 
surprising,  since  the  data  flow  testing  criteria  are  structured  testing  criteria,  and  all  of  the 
structured  testing  criteria  have  this  weakness.  For  the  data  flow  testing  criteria,  even 
quite  "ordinary"  subprograms  may  fail  to  be  adequately  testable. 

The  feasible  data  flow  testing  criteria  are  derived  from  the  data  flow  testing  criteria 
by  eliminating  from  consideration  those  definition-use  associations  which  can  never  be 
exercised.  All  but  one  of  the  feasible  data  flow  testing  criteria  satisfy  all  of  Weyuker's 
axioms,  including  applicability  property.  However,  given  test  set  T,  an  FDF  testing  cri- 
terion C,  and  a  subprogram  P,  it  is  undecidable  whether  T  is  C-adequate  for  P. 

We  have  shown  that  the  inclusion  relations  among  the  feasible  data  flow  testing  cri- 
teria differ  from  those  among  the  data  flow  testing  criteria.  While  all-du-paths  =>  all- 
uses,  (all-du-paths)*  is  incomparable  to  all  of  the  other  feasible  data  flow  testing  criteria. 
Also,  while  all-p-uses  =>  all-edges,  (all-p-uses)*  and  (all-edges)*  are  incomparable.  This 
means  that  the  feasible  data  flow  testing  criteria  do  not  "bridge  the  gap"  between  (the 
feasible  analogs  oO  branch  testing  and  path  testing.  However,  if  for  class  of  subpro- 
grams satisfying  the  quite  reasonable  NFUP  property,  defined  in  chapter  4,  (all-p-uses)* 
=>  (all-edges)*. 

The  fact  that  the  relationship  among  the  feasible  data  flow  testing  criteria  differs 
from  the  relationship  among  the  data  flow  testing  criteria  has  implications  for  studies  of 
the  formal  properties  of  other  testing  criteria.  Several  criteria  which  fail  to  satisfy  the 
applicability  property  (including  all  of  the  structured  testing  criteria)  have  been  defined 
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and  some  of  their  formal  propenies  have  been  investigated.  The  criteria  which  are  actu- 
ally used  in  practice,  however,  are  in  the  spirit  of  the  feasible  data  flow  testing  criteria.  It 
is  these  criteria,  which  take  account  of  semantic  information  which  should  be  studied,  not 
their  purely-syntactic  counterparts. 

For  example,  consider  a  practitioner  who  is  trying  to  branch  test  a  subprogram  P 
with  an  unexecutable  branch  b.  At  some  point,  observing  that  none  of  the  test  cases  have 
caused  b  to  be  executed,  she  will  examine  P,  determine  that  b  is  unexecutable,  and  elim- 
inate it  from  consideration.  Thus,  the  criterion  being  used  is  actually  (all-edges)*,  not 
all-edges. 

The  theorems  appearing  in  the  literature  which  compare  various  structtired  testing 
criteria  should  be  re-examined  in  this  light.  That  is,  for  each  of  these  criteria  C,  a  cri- 
terion C*  should  be  defined,  which  agrees  with  C  for  programs  having  no  unexecutable 
paths,  but  which  satisfies  the  applicability  property.  The  relationship  among  the  starred 
criteria  should  be  then  examined.  If  criterion  C\  does  not  include  criterion  C2  then  C\* 
will  not  include  critenon  Cj*-  However,  if  C\=^C2  it  will  not  necessarily  be  the  case 
thatri*=>C2*. 

In  order  to  determine  uhethcr  a  test  set  T  satisfies  a  feasible  data  flow  testing  cri- 
terion C,  it  is  necessary  to  examine  all  of  the  definition-use  associations  re<]uired  by  C 
but  not  covered  by  T  to  sec  whether  any  of  them  are  executable.  This  may  be  a  tedious 
and  error  prone  task.  With  that  in  mind,  we  have  developed  a  heuristic  method,  the  path 
expression  method,  which  attempts  to  determine  whether  a  given  definition-c-usc  associ- 
ation or  dcfinition-p-use  association  is  executable. 

Given  a  definition-use  association  (d.u.v),  a  viable  candidate  is  an  executable  path 
from  subprogram  entry  to  subprogram  exit,  which  has  a  definition  clear  subpaih  with 
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respect  to  v  from  d  to  u.  The  path  expression  method  uses  a  regular  expression  I,  to 
represent  a  superset  of  the  set  of  viable  candidates.  A  combination  of  symbolic  evalua- 
tion and  data  flow  information  is  used  to  weed  out  paths  which  can  be  shown  not  to  be 
viable  candidates,  and  ^  is  rewritten  accordingly.  If,  when  the  heuristic  terminates,  t, 
represents  the  empty  set  of  paths,  then  the  association  (d,u,v)  is  guaranteed  to  be  unexe- 
cutable. 

The  path  expression  method  uses  a  new  symbolic  evaluation  technique  called  partial 
symbolic  evaluation  of  path  expressions.  Roughly  speaking,  partial  symbolic  evaluation 
of  a  path  expression  a  which  represents  a  single  path  n  is  equivalent  to  symbolic  execu- 
tion of  K,  while  partial  symbolic  evaluation  of  a  path  expression  a  which  represents  more 
than  one  path  assigns  a  new  symbolic  value  to  those  variables  which  are  potentially 
redefined  on  any  one  of  the  paths.  The  result  of  partially  symbolically  evaluating  path 
expression  a  is  an  expression  which  describes  in  part  the  effect  of  executing  any  one  of 
the  paths  represented  by  a. 

The  path  expression  method  and  partial  symbolic  evaluation  of  path  expressions 
were  developed  in  order  to  help  eliminate  from  consideration  unexecutable  definition-use 
associations.  They  can  also  be  used  as  an  aid  to  test  data  generation.  If  the  path  expres- 
sion method  returns  the  answer,  "association  may  be  executable",  then  ^  represents  a 
non-empty  set  of  paths,  which  includes  all  of  the  viable  candidates.  Symbolic  evaluation 
of  ^  gives  a  predicate  which  must  be  satisfied  by  any  test  case  which  covers  the  given 
definition-use  association. 

We  believe  that  the  path  expression  method  will  be  able  to  identify  a  significant 
number  of  the  unexecutable  definition-use  associations  in  "typical"  subprograms.  In  con- 
junction  with   ASSET  this  will   form   a  usable  tool   for  feasible   data  flow  testing. 
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Typically,  the  user  would  invoke  the  path  expression  heuristic  when,  after  a  "reasonable" 
amount  of  testing,  some  definition-use  associations  remained  unexecuted.  If  the  heuristic 
determined  that  the  association  was  unexecutable,  the  user  could  safely  eliminate  it  from 
consideration.  Otherwise  the  user  would  be  left  to  determine  by  hand  whether  that  asso- 
ciation was  in  fact  unexecutable,  or  to  find  test  data  to  cover  the  association.  The  final 
path  expression  ^  produced  by  the  heuristic  would  provide  some  helpful  information  to 
the  user.  Preliminary  experiments  indicate  that  this  is  a  reasonable  scenario,  but  more 
experimentation  is  needed. 

2.  Future  Work 

There  are  many  avenues  for  future  research.  Several  questions  relating  to  the 
enhancement  of  the  path  expression  method  and  of  symbolic  evaluation  of  path  expres- 
sions are  suggested  in  Chapter  6,  section  5.  Some  other  topics  for  future  work  include 

•  Empirical  and  theoretical  studies  of  the  effectiveness  of  feasible  data  flow  testing: 

Such  studies  would  be  aimed  at  measuring  the  error  detecting  ability  of  the  criteria. 
In  order  to  perform  empirical  studies,  a  large  sample  of  "real"  programs,  preferably  with 
known  errors,  would  be  needed. 

Theoretical  studies  would  have  as  their  goal  the  classification  of  those  errors  which 
are  guaranteed  to  be  detected  by  the  various  criteria.  Theoretical  and  empirical  studies  of 
the  amount  of  test  data  required  by  the  criteria  would  also  be  useful.  A  related  issue  is 
the  classification  of  program  features  which  make  feasible  data  flow  testing  more  or  less 
effective. 

•  Empirical  and  theoretical  studies  of  effectiveness  of  heuristic: 
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Empirical  studies  would  be  aimed  at  determining  the  percentage  of  unexecutable 
associations  which  are  identified  as  such  by  the  heuristic.  We  are  currently  conducting  a 
pilot  study  in  which  we  are  examining  the  subprograms  in  Kemighan  and  Plauger's 
Software  Tools  in  Pascal  (KER  81]. 

Theoretical  studies  would  attempt  to  classify  those  unexecutable  associations  which 
are  guaranteed  to  be  identified.  A  related  issue  is  the  classification  of  program  features 
which  make  identifying  unexecutable  associations  harder  or  easier. 

•  Investigate  the  effect  of  using  more  sophisticated  data  flow  analysis: 

The  data  flow  analysis  used  to  define  the  DF  and  FDF  criteria  was  quite  unsophisti- 
cated. It  seems  reasonable  to  expect  that  using  more  sophisticated  data  flow  analysis 
techniques  would  lead  to  criteria  which  demanded  "better"  test  data.  For  example,  using 
interprocedural  data  flow  analysis,  one  could  define  criteria  which  apply  to  larger  pro- 
gram units  than  procedures  and  functions.  It  might  also  be  possible  to  define  criteria 
which  treat  array  elements  as  individual  variables,  using  heuristics  to  attempt  to  disambi- 
guate array  references.  One  could  then  explore  the  trade-off  between  difficulty  of  apply- 
ing the  criteria  to  a  program  and  test  set,  versus  error-detecting  ability  of  the  test  sets 
demanded  by  the  criteria. 

•  Extend  data  flow  testing  and  feasible  data  flow  testing  to  apply  to  programs  written  in 
other  languages: 

Depending  on  the  language  in  question,  this  could  involve  handling  minor  issues, 
such  as  static  variables  and  definitions  within  expressions,  and/or  major  issues  such  as 
those  arising  in  languages  which  explicitly  suppon  parallelism. 
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1.  INTRODUCTION 

ASSET  (A  System  to  Select  and  Evaluate  Tests)  is  an  interactive  software  testing 
tool  based  on  data  flow  testing  (1,2,3,4].  It  lakes  as  input  a  Pascal  program,  a  set  of  test 
data,  and  one  of  the  data  flow  testing  criteria.  It  runs  the  program  on  the  test  data  and 
produces  a  list  of  those  definition-use  associations  required  by  the  given  cnterion  but  not 
covered  by  the  test  data. 

The  user  then  examines  the  output  to  determine  whether  the  program  has  behaved 
according  to  its  specification.  If  the  program  computed  the  w-Tong  output,  the  user 
rcpons  that  the  program  has  a  bug.  If  the  program  has  behaved  correctly  on  all  of  the  test 
cases,  and  if  the  criterion  has  not  been  satisfied,  the  user  selects  additional  test  data  and 
continues  the  testing  process.  If  the  program  has  behaved  correctly  and  the  criterion  has 
been  satisfied  the  user  can  either  release  the  program,  certifying  that  it  has  been  ade- 
quately tested  according  to  the  criterion,  or  can  select  another  criterion  and  continue. 

ASSET  is  used  for  testing  an  individual  subprogram  (procedure,  function,  or  main 
program)  which  we'll  call  the  "subject  procedure".  We  will  refer  to  the  program  of 
which  the  subject  procedure  is  pan  as  the  "subject  program".  The  data  flow  analysis 
which  it  performs  is  inira-procedural  analysis  of  the  subject  procedure. 

This  manual  assumes  that  the  reader  has  some  familiarity  with  the  theor>'  of  data 
flow  testing  and  with  the  UNIX  operating  system.  To  acquire  some  familiarity  with  data 
flow  testing,  sec  the  references  in  section  7.  Section  2  of  this  manual  defines  the  rcstnc- 
tions  on  programs  accepted  by  ASSET.  Section  3  describes  the  most  imponant  ASSET 
commands.  Section  4  describes  the  rest  of  the  ASSET  commands.  Section  5  descnbcs 
how  to  use  ASSET  on  a  separately  compiled  program.  Section  6  gives  an  example  of  an 
ASSET  session.  Section  7  gives  references  to  relevant  papcrb. 

AcknowU'dm'mcnts: 

I  he  graphics  facility  was  s^ritten  by  Stewart  Weiss,  who  also  helped  implement  an 
earlier  version  of  ASSET.  The  user- interface  was  translated  from  csh  to  C  and  enhanced 
by  Ernie  Campbell. 
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Disclaimer: 

ASSET  is  still  a  "work  in  progress".  We  do  not  vouch  for  its  reliability  nor  are  we 
responsible  for  maintaining  it.  However,  if  you  find  any  bugs,  or  have  other  suggestions 
as  to  how  the  tool  could  be  improved  we  would  appreciate  hearing  from  you.  Please 
send  e-mail  to  frankl@csd2.nyu.edu  and  to  weyuker@csd2.nyu.edu. 

2.  THE  LANGUAGE  ACCEPTED  BY  ASSET 

ASSET  accepts  syntactically  correct  programs  written  in  the  subset  of  (level  0)  ISO 
Pascal  consisting  of  programs  which  do  not  include  any  of  the  following  constructs: 

1.  goto  statements  and  labels. 

2.  functions  with  var  parameters. 

3.  variant  records 

4.  with  statements 

5.  procedural  or  functional  parameters 

6.  forward  declarations 

7.  conformant  arrays 

The  following  identifiers  are  used  in  the  modified  version  of  the  subject  program  and 

therefore  should  not  be  used  in  the  subject  program: 
FW 
traversed 

The  following  identifiers  are  used  as  file  names  by  ASSET  and  therefore  should  not  be 

used  as  file  names  in  the  subject  program: 

IncFiles,  StringTabFile,  SymTabFile,  anomalies,  a.out,  comp.msgs,  copy, 
copy. exec,  copy.p,  copy2,  copy2.p,  defuse,  display,  globdecs,  graph, 
junkfile,  listing,  mainfile,  mainfile2,  pairs,  paths,  picfile,  record,  results, 
results. old,  subject,  traversed 

It  is  recommended  that  the  user  compile  the  program  before  submitting  it  to  ASSET  in 

order  to  insure  that  it  is  syntactically  correct. 

In  addition,  ASSET  accepts  certain  Berkeley  Pascal  constructs  which  are  not  part  of 
standard  Pascal,  including  separate  compilation  (see  section  5). 

ASSET  places  certain  arbitrary  restrictions  on  the  size  of  the  program.  Specifically, 
number  of  variables  visible  to  subject  procedure  <=  3(X) 
number  of  procedures  and  functions  visible  to  subject  procedure  <=  300 
number  of  types  visible  to  subject  procedure  <=  300 
number  of  constants  visible  to  subject  procedure  <=  3(K) 
number  of  identifiers  visible  to  subject  procedure  <=  500 
length  of  each  identifier  <=  80 

total  number  of  characters  in  all  identifiers  visible  to  subject  procedure  <=  3(X)0 
number  of  basic  blocks  in  subject  procedure  <=  ICX) 
For  instructions  on  how  to  increase  these  limits,  see  Appendix  I. 

BUGS: 

1.   ASSET  requires  that  the  last  statement  in  a  case  statement  be  followed  by  a  semi- 
colon. For  example,  the  (syntactically  correct)  program  fragment 
case  b  of 

true:  writeln('hello'); 

false:  wnteln('goodbye') 
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end 
will  be  rejected  by  ASSET,  but 
case  b  of 

true:  writelnChello'); 
false:  writeln('goodbye'); 
end 
will  be  accepted. 

The  user  should  check  that  all  case  statements  in  the  subject  procedure  are  of  this  form 
before  submitting  the  program  to  ASSET. 

2.  ASSET  translates  for  loops  incorrectly:  If  a  statement  of  the  form 

"for  i  :=  el  toe2  do  S" 
appears  in  the  subject  procedure,  in  writing  the  modified  subject  procedure  ASSET  will 
translate  this  to  a  code  fragment  functionally  equivalent  to 

i  :=el; 
<label  1  >:       if  i  >  e2  then  goto  <label2> 
S; 

i  :=  succ(i); 
goto  <labell>; 
<label2>:        next  statement. 

It  should  translate  it  to  a  fragment  functionally  equivalent  to 
i  :=el; 
tmp  :=e2; 
<labell>:      if  i  >  tmp  then  goto  <label2> 
S; 

i  :=  succ(i); 
goto  <label  1  >; 
<label2>:        next  statement; 

Thus,  if  any  variable  occurring  in  expression  is  modified  within  the  statement  S  (which  is 
legal,  though  poor  programming  practice),  the  modified  subject  program  will  not  be 
equivalent  to  the  subject  program. 

These  bugs  will  be  corrected  in  the  next  version  of  ASSET. 

3.  THE  MOST  IMPORTANT  ASSET  COMMANDS 

In  this  section  we  describe  the  basic  ASSET  commands.  The  commands  arc  listed 
in  an  order  in  which  they  can  be  executed.  The  section  ends  with  some  comments  on  the 

relationship  between  the  commands.  Additional  commands  are  described  in  the  next  sec- 
tion, 

Invoking  ASSET 

We  will  refer  to  the  directory  from  which  .ASSET  is  invoked  as  the  "current  work- 
ing directory".  Before  invoking  ASSET  you  should  set  up  a  "subject  director."  which 
contains  the  source  of  the  program  to  be  tested.  It  is  usually  convenient  to  have  the  sub- 
ject director)  be  a  subdirectory  of  current  working  directory,  but  this  is  not  necessary. 
The  subject  directory  should  mn  be  the  same  as  the  current  working  director. .  To  enter 
ASSET  type  "asset"  in  response  to  the  I  NIX  prompt  ASSET  will  rcsp<">nd  \Mth 
Welcome  to  ASSET.  For  help  type  "help." 
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Enter  relative  pathname  of  initial  default  directory.  »: 

At  this  point  you  should  enter  the  pathname  of  the  subject  directory,  relative  to  the 
current  working  directory.  ASSET  will  then  print  the  ASSET  "command  level  prompt", 
">»:  ",  signifying  that  it  is  ready  to  accept  a  command. 

We  now  describe  each  of  the  ASSET  commands.  These  commands  can  be  invoked 
when  ASSET  types  the  command  level  prompt.  Some  of  these  commands  will  query  the 
user  for  additional  information.  When  this  occurs,  ASSET  will  prompt  the  user  with  its 
"inner  prompt",  "»:  ".  Sometimes  ASSET  will  ask  the  user  a  question  which  has  a  yes 
or  no  answer.  The  user  should  respond  with  "Y"  or  "y"  for  yes  and  "N"  or  "n"  for  no. 
ASSET  will  indicate  the  default  (if  there  is  one)  in  brackets.  If  the  default  is  "Y"  then 
any  non-null  response  which  does  not  begin  with  "N"  or  "n"  will  be  considered  to  be  "Y". 
Similarly,  if  the  default  is  "N"  then  any  non-null  response  which  does  not  begin  with  "Y" 
or  "y"  will  be  considered  to  be  "N".  If  the  user  just  hits  carriage  return,  the  inner  prompt 
will  continue  to  appear  until  a  non-null  response  is  given.  The  logical  dependence  of 
ASSET  commands  is  given  in  Figure  1.  The  commands  must  be  typed  in  lower  case 
letters.  Any  command  can  be  abbreviated  by  its  first  three  letters. 

begin 

The  "begin"  command  analyzes  the  subject  procedure,  breaking  it  into  basic  blocks, 
constructing  its  flow  graph,  and  determining  the  blocks  in  which  each  variable  has 
definitions  and  c-uses  and  the  edges  on  which  each  variable  has  p-uses.  An  adjacency 
list  for  the  graph  is  written  to  the  file  "graph",  and  a  representation  of  the  data-flow  infor- 
mation is  written  to  the  file  "defuse".  The  "String  Table"  in  which  ASSET  has  recorded 
the  identifiers  appearing  in  the  subject  program  is  written  to  the  file  "StringTabPile". 

In  addition,  ASSET  constructs  a  modified  version  of  the  subject  program  in  which 
probes  have  been  insened.  That  is,  at  the  beginning  of  basic  block  i,  the  statement 
"writeln(traversed,i  :FW);"  is  inserted.  Declarations  of  the  text  file  "traversed",  the  con- 
stant "FW",  and  any  labels  which  appear  in  the  modified  program  are  also  inserted.  We 
will  refer  to  the  resulting  program  as  the  "modified  subject  program".  ASSET  writes  a 
pretty-printed  version  of  the  modified  subject  program  to  the  file  "copy.p". 

ASSET  also  creates  files  "copy",  "copy2"  (preliminary  versions  of  the  modified  sub- 
ject program)  and  "SymTabFile"  (a  representation  of  the  Symbol  table  for  the  subject 
program)  which  will  not  be  used  by  subsequent  commands,  but  may  be  useful  for  debug- 
ging- 

When  you  type  "begin",  ASSET  will  respond  with 
Enter  name  of  subject  procedure  file.  »: 
If  there  is  no  file  of  this  name  in  the  subject  directory  ASSET  will  print  an  error  message 
and  return  to  the  command  level.   Otherwise  ASSET  will  copy  the  file  to  the  file  "sub- 
ject", then  say 

Separate  Compilation?  (Y/N)[N]  »: 
Answer  "y"  if  the  subject  program  is  separately  compiled,  "n"  otherwise.    We  will 
assume  throughout  the  rest  of  this  section  that  the  subject  program  is  NOT  separately 
compiled.   See  section  5  for  instructions  on  using  ASSET  on  separately  compiled  pro- 
grams. 
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Next,  ASSET  will  ask  you  for  the  name  of  the  procedure  to  be  instrumented  for  test- 
ing. You  may  either  supply  the  name  of  the  procedure,  or  direct  ASSET  to  prompt  you 
with  the  names  of  the  procedures  in  the  program. 

select 

The  "select"  command  causes  ASSET  to  print  a  menu  and  query  the  user  to  select 
one  of  the  data  flow  testing  criteria. 

find-associations 

This  command  causes  ASSET  to  produce  a  list  of  all  definition-use  associations  (of 
the  appropriate  type)  in  the  subject  procedure.  If  the  selected  criterion  is  All-du-paths, 
find-associations  writes  a  list  of  all  of  the  du-paths  in  the  subject  procedure  to  the  file 
"paths".  Otherwise,  find-associations  writes  a  list  of  all  of  the  definition-c-use  associa- 
tions and  definition-p-use  associations  in  the  subject  procedure  to  the  file  "pairs".  Note 
that  in  the  worst  case,  finding  all  of  the  du-paths  may  take  time  exponential  in  the 
number  of  basic  blocks. 

compile 

The  "compile"  command  invokes  the  Pascal  compiler  to  compile  the  modified  sub- 
ject program.  The  executable  code  is  written  to  the  file  "copy. exec".  Warning  and  Error 
messages  generated  by  the  compiler  are  written  to  the  file  "comp.msgs". 

run 

The  "run"  command  executes  the  modified  subject  program,  on  one  or  more  test 
cases  and  adds  the  program  trace  to  the  file  "traversed".  When  you  type  "run",  ASSET 
may  respond  with 

File  'traversed'  already  exists.  Type  'Y'  if  you  want  to  append  to  it,  'N' 

otherwise.  »: 
Type  "Y"  if  you  want  to  add  the  program  traces  generated  by  previous  test  cases  to  the 
current  one,  "N"  otherwise.   If  you  answer  no,  ASSET  will  ask  you  whether  you  want  to 
save  the  old  file  "traversed"  (i.e.  the  program  traces  of  previous  test  cases). 

Next,  ASSET  will  ask  you  whether  the  program  takes  input  from  command  line 
arguments.  Answer  yes  if  the  subject  program  reads  command  line  arguments  via  the 
Berkeley  Pascal  argv  mechanism  or  if  you  would  like  to  redirect  the  standard  input  or 
output  using  the  UNIX  redirection  mechanism.  If  you  answer  yes,  ASSET  will  prompt 
you  for  the  command  line  arguments.  For  example  supplying 

<infilc  >outfile 
as  the  command  line  argument  will  cause  the  modified  subject  program  to  read  its  stan- 
dard input  from  the  file  "mfile"  and  v^xitc  its  standard  output  to  the  file  "outfilc". 

ASSET  will  then  say 

Executing  modified  subject  program  ... 
then  will  execute  the  modified  subject  program  with  the  command  line  arguments  you've 
supplied.  When  execution  terminates,  ASSET  will  ask  you  whether  you  would  like  to 
run  the  program  on  additional  test  data.  If  you  answer  yes,  ASSET  will  again  ask  you  for 
command  line  arguments  and  execute  the  program.  The  new  program  trace  will  be 
added  to  the  traces  which  have  previously  been  uniten  to  the  file  "traversed".    If  you 


-  158- 

answer  no,  ASSET  will  return  to  the  command  level. 

Note  that  one  trace  is  produced  for  each  run  of  the  subject  program,  not  one  for 
each  invocation  of  the  subject  procedure.  Thus  a  single  program  trace  may  contain 
several  paths  from  the  entry  to  the  exit  of  the  subject  procedure.  If  the  subject  procedure 
is  recursive  these  paths  will  be  "nested"  within  one  another. 

check 

Tlie  "check"  command  checks  whether  the  selected  criterion  is  satisfied  by  the  test 
data  whose  program  traces  are  in  the  file  "traversed".  ASSET  writes  to  the  file  "results"  a 
list  of  those  def-use  associations  which  are  required  by  the  criterion  but  have  not  been 
covered  by  the  test  data.  (Before  doing  this,  ASSET  moves  the  old  version  of  the  file 
"results"  to  the  file  "results. old".  This  allows  the  user  to  easily  see  which  def-use  associa- 
tions were  eliminated  by  the  most  recent  additions  to  the  test  set  by  using  the  UNIX  com- 
mand "diff  results  results. old").  If  the  file  "traversed"  does  not  exist,  ASSET  initializes  it 
to  represent  an  empty  set  of  program  traces.  In  this  case,  "results"  will  contain  a  list  of 
all  of  the  def-use  associations  required  by  the  criterion. 

save 

The  "save"  command  moves  the  files  "subject",  "copy.p",  "copy. exec",  "traversed", 
"results",  "pairs",  "paths",  "defuse",  etc.  from  the  current  working  directory  to  the  sub- 
ject directory. 


exit 


Moves  important  files  to  the  subject  directory  and  removes  the  rest,  then  exits. 


Note  on  the  relationship  between  the  commands 

Before  checking  whether  a  set  of  test  data  satisfies  any  of  the  criteria  All-defs,  All- 
p-uses,  All-c-uses,  All-c-uses/some-p-uses,  All-p-uses/some-c-uses  or  All-uses  it  is 
necessary  for  ASSET  to  create  a  list  of  definition-use  associations  in  the  file  "pairs". 
Before  checking  whether  a  set  of  test  data  satisfies  the  criterion  All-du-paths  it  is  neces- 
sary for  ASSET  to  create  a  list  of  du-paths  in  the  file  "paths".  When  the  most  recenUy 
selected  criterion  is  All-defs,  All-p-uses,  All-c-uses,  All-c-uses/some-p-uses,  All-p- 
uses/some-c-uses  or  All-uses  the  "find-associations"  command  creates  the  file  "pairs". 
When  the  most  recently  selected  criterion  is  All-du-paths  the  "find-associations"  com- 
mand creates  the  file  "paths".  Thus  it  is  necessary  to  use  the  "find-associations"  com- 
mand once  before  the  first  time  that  "check"  is  invoked  with  one  of  the  criteria  involving 
"pairs"  and  once  before  the  first  time  that  "check"  is  invoked  with  the  criterion  du-paths. 

Suppose  the  user  has  been  attempting  to  satisfy  criterion  C  then  decides  to  switch  to 
another  criterion,  C.  Let  T  be  the  set  of  test  data  on  which  the  program  has  just  been  run 
(i.e.  assume  that  set  of  program  traces  in  the  file  "traversed"  corresponds  to  test  set  T). 
Then  to  see  how  close  T  comes  to  satisfying  C  the  user  need  only  invoke  the  "select- 
criterion"  command  (to  select  the  new  criterion,  C)  and  the  "check"  command.  It  is  not 
necessary  to  run  the  program  on  test  set  T  again. 
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4.  OTHER  ASSET  COMMANDS 

csh 

Spawns  a  new  shell  from  which  you  can  invoke  your  favorite  UNIX  commands.  To 
exit  from  this  shell  and  return  to  the  ASSET  command  level  type  control-D. 

directory 

Displays  the  name  of  the  subject  directory  and  prompts  the  user  for  a  new  subject 
directory. 

graphics 

The  "graphics"  command  writes  a  PIC  program  describing  a  graphical  representa- 
tion of  the  subject  procedure's  flow  graph.  The  PIC  program  is  written  to  the  file 
"picfile".  To  obtain  a  picture  of  the  flow  graph,  run  the  troff  preprocessor  "pic"  (from  a 
UNIX  shell,  not  directly  from  ASSET)  with  picfile  as  input  then  pipe  it  through  troff. 

[At  NYU,  pipe  the  pic  output  through  itroff  (for  output  on  a  Canon  Laser  Printer),  psroff 
(for  output  on  an  Apple  Laser  Writer),  or  preroff  (for  output  on  the  console  if  you  are  in 
the  Suntools  environment.)  For  example,  typing 

%  pic  picfile  I  psroff  -Pap3 
(where  "%"  is  the  UNIX  prompt)  will  send  printed  output  to  the  LaserWriter  in  room 
505.] 

help 

On  line  help  facility. 

purge 

Removes  (i.e.  destroys)  all  of  the  files  "subject",  "copy.p",  "copy. exec",  "traversed", 
"results",  "pairs",  "paths",  "defuse",  etc.  from  the  current  working  directory.  Handle 
with  care. 

sepcomp 

Compiles  and  links  separately  compiled  subject  program.  See  section  5. 

restore 

Copies  the  files  "subject",  "copy.p".  "copy. exec",  "traversed",  "results",  "pairs", 
"paths",  "defuse",  etc.  from  the  subject  directory  to  the  current  working  directory,  thus 
restoring  a  previous  ASSET  session.  If  the  files  in  the  subject  directory  were  created  by 
running  ASSET  on  a  separately  compiled  subject  program  some  additional  action  is 
taken  (sec  section  5.) 

view 

The  view'  command  directs  ASSET  to  display  the  contents  of  a  (text)  file  on  the 
screen,  a  screenful  at  a  time.  If  you  type  "view  <filcname>"  ASSET  will  display  that  file. 
Otherwise.  ASSET  will  prompt  you  for  the  file  name  The  view  command  works  by 
invoking  the  UNIX  command    more",  hence  responds  to  the  same  inputs  as  docs  "more". 
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Most  importantly,  type  carriage  return  to  display  the  next  line  of  text,  space  to  display  the 
next  screenful  of  text  and  q  to  return  to  the  ASSET  command  level.  For  other  options, 
see  the  UNIX  manual  page  for  "more". 

5.  USING  ASSET  ON  SEPARATELY  COMPILED  PROGRAMS 

This  section  describes  how  to  use  ASSET  to  test  separately  compiled  programs. 
This  section  assumes  some  familiarity  with  Berkeley  Pascal's  separate  compilation 
mechanism.  (See  the  relevent  sections  of  the  Berkeley  Pascal  manual.) 

To  instrument  a  procedure  for  testing,  ASSET  must  create  modified  versions  of  the 
file  containing  the  source  for  that  procedure,  the  file  containing  the  source  for  the  main 
program,  and  a  file  containing  global  variable  declarations.  To  create  an  executable  code 
for  the  modified  subject  program,  ASSET  must  recompile  and  relink  the  modified  files 
and  any  other  files  dependent  on  them. 

In  order  to  do  this,  ASSET  assumes  that  subject  directory  contains  a  Makefile 
describing  the  dependency  information  (see  documentation  for  the  UNIX  command 
"make"),  that  there  is  a  file  called  "globals.h"  which  has  global  variable  declarations,  and 
such  that  the  object  files  for  the  main  program  and  for  the  procedure  to  be  instrumented 
are  dependent  on  globals.h. 

If  the  user  answers  "yes"  to  the  question  "Separate  compilation?"  during  execution 
of  ASSET'S  "begin"  command,  ASSET  prompts  the  user  for  the  names  of  the  file  con- 
taining the  main  program  and  the  file  containing  the  subject  procedure.  We'll  refer  to 
these  as  <main_file>  and  <subject_proc_file>,  respectively.  ASSET  moves  <main_file>, 
<subject_proc_file>,  and  "globals.h"  to  <main_file>.bak,  <subject_proc_file>.bak,  and 
"global s.h.bak",  respectively.  ASSET  writes  the  modified  versions  of  the  files  to 
<main_file>,  <subject_proc_file>,  and  "globals.h".  Thus  when  the  UNIX  "make"  com- 
mand is  invoked,  the  modified  subject  program  will  be  compiled  and  linked. 

To  compile  the  modified  subject  program  use  the  ASSET  command  "sepcomp", 
rather  than  the  ASSET  "compile"  command.  Sepcomp  invokes  the  UNIX  make  com- 
mand, and  moves  the  executable  code  (which  is  assumed  to  be  in  "<subject- 
directory>/a.out")  to  copy. exec. 

When  the  "save"  command  is  invoked,  ASSET  looks  for  files  with  the  ".bak"  exten- 
sion in  the  subject  directory.  For  each  such  file,  ASSET  moves  <fname>  to 
<fname>.mod  and  moves  <fname>.bak  to  <fname>.  Thus  the  original  files  have  the 
same  contents  as  they  had  before  the  ASSET  session,  and  the  modified  versions  have  the 
extension  ".mod". 

The  "restore"  command  looks  for  files  with  the  ".mod"  extension  and  queries  the 
user  as  to  whether  they  should  be  restored.  If  the  user  answers  yes  to  the  question 
"restore  <fname>.mod  (Y/N)?"  then  ASSET  moves  <fname>to  <fname>.bak  and 
moves  <fname>.mod  to  <fname>.  Thus  to  restore  a  session  in  which  the  instrumented 
procedure  was  in  file  "Proc3.p",  the  main  program  was  in  file  "main.p",  answer  yes  to  the 
questions  "restore  ProcS.p.mod?",  "restore  main.p. mod?",  and  "restore  globals.h. mod?" 

Restrictions  on  separately  compiled  programs  : 

The  separate  compilation  option  makes  the  following  assumptions: 
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1.  All  relevant  files  are  in  the  subject  directory. 

2.  There  is  a  file  called  "globals.h"  containing  global  definitions 
in  the  subject  directory. 

3.  "globals.h"  is  mentioned  in  #include  directives  in  the  subject 
procedure  file  and  in  the  main  program  file. 

4.  There  is  a  Makefile  containing  the  dependency  information,  etc. 
needed  to  re-link  and  load  the  program  contained  in  the  subject 
directory. 

5.  The  Makefile  directs  the  executable  code  to  file  "a.out". 

6.  The  procedure  being  instrumented  is  at  the  outer  level 
of  nesting. 

7.  TTie  last  included  .h  file  cannot  have  any  extra  garbage  at  the 
end.  (This  is  a  bug). 

NB  —  It  is  currently  NOT  POSSIBLE  to  instrument  the  main  program 
of  a  separately  compiled  module. 

6.  EXAMPLE  OF  AN  ASSET  SESSION 

In  this  section  we  present  an  annotated  example  of  an  ASSET  session.  To  distin- 
guish between  text  written  by  the  system  and  that  written  by  the  user,  we  display  text 
entered  by  the  user  in  boldface  type.  Comments  are  written  in  italics. 

Example  1: 

This  example  shows  an  ASSET  session  in  which  a  brute-force  string-matching  pro- 
cedure is  analyzed.  The  program  reads  a  string  and  a  pattern.  It  is  supposed  to  print  the 
position  in  the  stnng  at  which  the  partem  first  appears,  and  print  0  if  the  pattern  never 
appears  in  the  string.  The  current  working  directory  has  a  subdirectory  called  "StrMtch". 
The  file  "StrMtch/subject.p"  contains  the  following  program: 
program  TestStringMatch  (input, output); 
const  MAX  =  80; 

LENGTH  =  10; 
type  Source  =  array[  1  ..MAX]  of  char; 

String  =  arrayl  1  ..LENGTH]  of  char; 
var      Pat  :  String;  / 

Txt  :  Source; 
i.result.TxtLen   :  integer; 

function  StringMatch(Pattem:String;  SorTcxt:  Source; 

Patlxn,  SorLen  :  integer):  integer. 
{Brute  force  pattern-matcher.  Returns  0  for  no  match) 
var  PatPos.  SorPos  :  integer; 
begin 

PatPos  :=  1; 
SorPos  :=  I; 
repeat 

if  Pattcml PatPos]  =  SorTcxij SorPos]  then 
begin 

SorPos  :=  SorPos  +  1; 
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PatPos  :=  PatPos  +  1 
end  {then) 
else 
begin 

SorPos  :=  (SorPos  -  PatPos)  +  2; 
PatPos  :=  1 
end;  {else} 
until  (PatPos  >  PatLen)  or  (SorPos  >  SorLen); 
if  PatPos  >  PatLen 
then  StringMatch  :=  SorPos  -  PatLen 
else  StringMatch  :=  0 
end;  '{StringMatch) 

begin  {main  program) 
{Read  Input) 

writelnCENTER  THE  TEXT'); 
i:=l; 

while  (not(eoln)  and  (i  <=  MAX))  do 
begin 

read(Txt[i]); 

i  :=  i  +  1 
end; 

TxtLen  :=  i  -  1; 
readln; 

writelnCENTER  THE  PATTERN'); 
i:=l; 

while  (not(eoln)  and  (i  <=  LENGTH))  do 
begin 

read(Pat[i]); 

i  :=  i  +  1 
end; 
readln; 

result  :=  StringMatch(Pat,Txt,i  -  1,  TxtLen); 
writeln('The  pattern  first  appears  at  position  ',result:3, 
'  in  the  text.'); 
end. 

Script  staned  on  Sun  May  31  13:09:59  1987 
csd27>  asset 

Welcome  to  ASSET.  For  help  type  "help." 


Enter  relative  pathname  of  initial  default  directory. 
»:  StrMtch 


»>:  begin 
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Enter  name  of  subject  procedure  file. 
»:  subject. p 

Separate  Compilation?  (Y/N)  [N] 

»:  n 

Enter  the  name  of  the  procedure  to  be  instrumented. 

If  you  would  like  to  be  prompted  with  the  names  of 

the  procedures  in  the  subject  program,  just  hit  carriage  return. 

»: 

=>  Should  TestStringMatch 

be  instrumented  for  testing?  (Y/N) 

»:  n 

==>  Should  StringMatch 

be  instrumented  for  testing?  (Y/N) 

»:y 

»>:  select 

SELECT  A  CRITERION 

A.  All-defs 

B.  All-c-uses 

C.  AU-p-uses 

D.  AU-c-uses/some-p-uses 

E.  All-p-uses/some-c-uses 

F.  All-uses 

G.  AU-du-paths 

Enter  letter  representing  the  selected  criterion 
»:  a 

Criterion  is  All-defs. 

>»:  find 

We  next  check  whether  the  criterion  has  been  satisfied  with  no  test  data.  This  is  not 
necessary,  but  by  doing  this,  we  get  a  list  of  all  of  the  def-c-use  and  def-p-use  associa- 
tions in  the  program. 
»>:  check 
ALL-DEFS: 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 
with  respect  to  from         to 

Pattern  1       (  3.  5) 

Pattern  1        (   3.  4) 

AND 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 
with  respect  to  from         to 


8 

(  6,  3) 

(  6,  7) 

(  7,  9) 

(  7,  8) 
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SorText  1       (  3,  5) 

SorText  1       (  3,  4) 

AND 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 

with  respect  to  from         to 

PatLen 

PatLen 

PatLen 

PatLen 

PatLen 

AND 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 
with  respect  to  from         to 

SorLen  1       (  6,  3) 

SorLen  1       (  6,  7) 

AND 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 

with  respect  to  from        to 

PatPos  2        4 

PatPos  2        5 

PatPos  2      (  3,  5) 

PatPos  2       (  3,  4) 

AND 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 

with  respect  to  from         to 

SorPos  2         4 

SorPos  2         5 

SorPos  2       (  3,  5) 

SorPos  2       (  3,  4) 

ANT) 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 

with  respect  to  from         to 

PatPos 

PatPos 

PatPos 

PatPos 

PatPos 

PatPos 

PatPos 

PatPos 

AND 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 


4 

4 

4 

5 

4 

(  3,  5) 

4 

(  3,  4) 

4 

(  6,  3) 

4 

(  6,  7) 

4 

(  7,  9) 

4 

(  7,  8) 
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with  respect  to 

from 

to 

SorPos 

4 

4 

SorPos 

4 

5 

SorPos 

4 

8 

SorPos 

4 

(  3, 

5) 

SorPos 

4 

(  3, 

4) 

SorPos 

4 

(  6, 

3) 

SorPos 

4 

(  6, 

7) 

AND 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths; 


with  respect  to 

from        tc 

PatPos 

5 

4 

PatPos 

5 

5 

PatPos 

5 

(  3,  5) 

PatPos 

5 

(  3,  4) 

PatPos 

5 

(  6,  3) 

PatPos 

5 

(  6,  7) 

PatPos 

5 

(  7,  9) 

PatPos 

5 

(  7,  8) 

AND 

Still  must  exercise  at  least  one  of  the  following  def-clear  paths: 


with  respect  to 

from 

to 

SorPos 

5 

4 

SorPos 

5 

5 

SorPos 

5 

8 

SorPos 

5 

(  3, 

5) 

SorPos 

5 

(  3, 

4) 

SorPos 

5 

(  6. 

3) 

SorPos 

5 

(  6, 

7) 

To  look  at  these  again  use  the  command  'view  results'. 

Next  we  will  compile  the  program  and  start  running  it  on  some  test  data.  As  the  initial 
test  data  set,  we  select  one  element  in  which  the  pattern  appears  in  the  string  and  one 
element  in  which  the  pattern  does  not  appear  in  the  string. 
»>:  compile 
Compilation  begins  ... 
Done,  and  successful. 

»>:  run 

File  'traversed'  already  exists. 

Do  you  want  to  append  to  it?  (Y/N)  f Y] 

»:  n 

Do  you  want  to  save  old  'traversed'?  (Y/N)  [N] 

»:  n 

Command  line  arguments?  (Y/N)  [Y] 
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»:  n 

Executing  mcxiified  subject  program  ... 

ENTER  THE  TEXT 
The  quick  brown  fox 

ENTER  THE  PATTERN 
quick 

The  pattern  first  appears  at  position    5  in  the  text. 

Do  you  want  to  run  the  subject  program 

on  some  additional  test  data?  (Y/N)  [N] 

»:y 

Command  line  arguments?  (Y/N)  [Y] 

»:  n 

Executing  modified  subject  program  ... 

ENTER  THE  TEXT 
The  quick  brown  fox 

ENTER  THE  PATTERN 
quack 

The  pattern  first  appears  at  position    0  in  the  text. 
Do  you  want  to  run  the  subject  program 
on  some  additional  test  data?  (Y/N)  [N] 
»:  n 

>»:  check 

ALL-DEFS: 
CRITERION  SATISFIED 

To  look  at  these  again  use  the  command  'view  results'. 

The  test  set  satisfies  the  all-defs  criterion.    We  next  check  whether  the  same  test  set 
satisfies  a  stronger  criterion,  all-uses. 
»>:  select 

SELECT  A  CRITERION 

A.  AU-defs 

B.  All-c-uses 

C.  All-p-uses 

D.  AU-c-uses/some-p-uses 

E.  All-p-uses/some-c-uses 

F.  All-uses 

G.  AU-du-paths 


Enter  letter  representing  the  selected  criterion 
»:f 
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Criterion  is  All-uses. 


>»;  check 

ALL-USES 

Still  need  to  exercise  all  of  the  following  of  def-clear  paths: 


with  respect  to 

from 

to 

PatPos 

2 

4 

SorPos 

2 

4 

SorPos 

5 

8 

PatPos 

2 

(  3, 

4) 

SorPos 

2 

(  3, 

4) 

PatPos 

4 

(  7, 

9) 

PatPos 

5 

(  7, 

8) 

To  look  at  these  again  use  the  command  'view  results'. 

To  aid  in  the  selection  of  test  data  which  cover  the  remaining  def-use  associations,  the 
user  can  draw  the  flow  graph  (see  Figure  2)  and  use  "copy.p"  to  aid  in  labeling  each 
node  with  the  corresponding  code.  Notice  that  for  each  i,  block  i  begins  with  the  state- 
ment "write(  traversed, i:FW);" 

>»:  view  copy.p 

program  TesiStringMatch(traversed,  input,  output); 

var 

traversed:  text; 
const 

MAX  =  80; 

LENGTH  =  10; 
type 

Source  =  array  [1..MAX]  of  char; 

String  =  array  (1.. LENGTH]  of  char 
var 

Pat:  String; 

Txt:  Source; 

i,  resuh,  TxtLen:  integer; 

function  StringMatch(Pattern:  String;  SorText:  Source;  PatLen,  SorLen;  integer):  integer; 

label 

10; 
const 

RV  =  4; 
var 

PatPos,  SorPos:  integer; 

begin 
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write(traversed,  1:  FW); 
write(traversed,  2:  FW); 
PatPos:=  1; 
SorPos  :=  1; 
10: 

write(traversed,  3:  FW); 
if  Pattem[PatPos]  =  SorText[SorPos]  then  begin 

write(traversed,  4:  FW); 

begin 

SorPos  :=  SorPos  +  1; 
PatPos  :=  PatPos  +  1 

end 
end  else  begin 

write(traversed,  5:  FW); 

begin 

SorPos  :=  SorPos  -  PatPos  +  2; 
PatPos  :=  1 

end 
end; 

write(traversed,  6:  FW); 
if  not  ((PatPos  >  PatLen)  or  (SorPos  >  SorLen))  then 

goto  10; 
write(traversed,  7:  FW); 
if  PatPos  >  PatLen  then  begin 

write(traversed,  8:  FW); 

StringMatch  :=  SorPos  -  PatLen 
end  else  begin 
write(traversed,  9:  FW); 

StringMatch  :=  0 
end; 

write(traversed,  10:  FW); 
write(traversed,  11:  FW) 
end;  {  StringMatch  ) 

begin 

rewrite  (traversed); 

writelnCENTER  THE  TEXT); 

i:=l; 

while  not  eoln  and  (i  <=  MAX)  do  begin 

read(Txt[i]); 
i  :=  i  +  1 

end; 

TxtLen  :=  i  -  1; 

readln;  

writelnCENTER  THE  PATTERN'); 
i:=l; 
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while  not  eoln  and  (i  <=  LENGTH)  do  begin 

read(Pat[i]); 
i:=i  +  1 

end; 

readln; 

result  :=  StringMatch(Pat,  Txt,  i  -  1,  TxtLen); 

writelnCThe  pattern  first  appears  at  position  ', 

end.  {  TestStringMatch  ) 


result:  3,  '  in  the  text.') 


Examining  the  annotated  flow  graph,  we  see  that  in  order  to  execute  a  path  from  2  to  4 
which  is  definition  clear  with  respect  to  PatPos,  a  test  case  in  which  the  first  character  of 
the  string  matches  the  first  character  of  the  pattern  is  needed.  We  run  the  program  on 
such  a  test  case,  adding  its  trace  to  those  produced  by  the  test  cases  run  previously. 


>»:  run 

File  'traversed'  already  exists. 

Do  you  want  to  append  to  it?  (Y/N)  [Y] 

»:  y 

Command  line  arguments?  (Y/N)  [Y] 
»:  n 

Executing  modified  subject  program  ... 

ENTER  THE  TEXT 
The  quick  brown  fox 

ENTER  THE  PATTERN 
The 

The  pattern  first  appears  at  position    1  in  the  text. 
Do  you  want  to  run  the  subject  program 
on  some  addilionai  test  data?  (Y/Nj  [NJ 
»:  n 

»>:  check 

Still  need  to  exercise  all  of  the  following  of  def-clear  paths: 

with  respect  to  from  to 

Sorl\^s  5         8 

PatPos  4        (  7.  9) 

PatPos  3        (  7.  8) 

To  look  at  these  again  use  the  command  'view  results'. 

Examining  the  annotated  flow  graph,  we  see  that  in  order  to  execute  a  path  from  5  to  8 
which  IS  definition  clear  with  respect  to  SorPos,  a  test  case  in  which  the  pattern  is  the 
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null  string  is  needed.   We  run  the  program  on  such  a  test  case,  adding  its  trace  to  those 
produced  by  the  test  cases  run  previously. 


>»:  run 

File  'traversed'  already  exists. 

Do  you  want  to  append  to  it?  (Y/N)  [Y] 

»:y 

Command  line  arguments?  (Y/N)  [Y] 

»:  n 

Executing  modified  subject  program  ... 

ENTER  THE  TEXT 
The  quick  brown  fox 
ENTER  THE  PATTERN 

The  pattern  first  appears  at  position    2  in  the  text. 

Examining  the  program's  output,  we  see  that  the  program  has  reported  that  the  null 

string  first  appears  in  position  2  of  the  string.  This  is  an  error!  The  reader  should  note 

that  this  bug  is  guaranteed  to  be  detected  by  any  test  set  which  is  adequate  according  to 

the  all-uses  criterion.  Having  discovered  a  bug,  we  save  the  ASSET  session  and  prepare 

to  report  the  error. 

Do  you  want  to  run  the  subject  program 

on  some  additional  test  data?  (Y/N)  [N] 

»:  n 

>»:  save 

Note  that  to  cover  the  one  remaining  association,  (4,(7,9)J'atPos),  we  would  have 
to  include  a  test  case  in  which  the  first  k  characters  of  the  pattern  matched  the  last  k 
characters  of  the  text,  for  some  k  s.t.  0  <  k  <  the  length  of  the  pattern. 


r 


i-»  I 


begin 


select  additional 

test  data 
select  different 

criterion 
report  errror 
release  software 


o 


A 

user 


\ 


results 


Figure  1 

Schematic  diagram  of  ASSET 

P  is  program    C  is  criterion.  I  is  Test  Set. 
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APPENDIX  n 
Algorithm  for  derivation  of  initial  path  expression 

In  this  appendix  we  present  an  algorithm  which  takes  as  input  a  flow  graph  G,  and  a 
definition-use  association  (d,u,v)*  and  which  produces  a  path  expression  ^  such  that  all 
viable  candidates  are  contained  in  PATHS (^).  The  length  of  the  path  expression  ^  is 
0(N)  where  N  is  the  number  of  nodes  in  the  flow  graph.  Tlie  algorithm  runs  in  time 
0(N).  Throughout  this  section  we  assume  that  the  flow  graph  G  is  formed  according  to 
the  rules  in  Figure  2.1.1,  from  a  subprogram  having  no  repeat-until  statements. 

The  algorithm  is  based  on  a  generalization  of  the  notion  of  dominators  [HEC77]. 
Recall  that  node  i  dominates  node  j  if  and  only  if  every  path  from  the  entry  node  to  node 
j  passes  through  i;  i  properly  dominates  j  if  and  only  if  i?tj  and  i  dominates  j;  i  immedi- 
ately dominates  j  iff 

a)  i  properly  dominates  j,  and 

b)  if  k  properly  dominates  j  and  k^^i  then  k  (properly)  dominates  i. 
We  will  say  that  node  i  s-dominates  node  j  if  and  only  if 

a)  i  and  j  are  reachable  from  s,  and 

b)  every  path  from  from  s  to  j  passes  through  i. 

We  will  say  that  i  properly  s-dominates  j  if  and  only  if  i^j  and  i  s-dominates  j;  i  immedi- 
ately s-dominates  j  iff 

a)  i  properly  s-dominates  j.  and 


•  Aciuall) .  ihc  algonlhm  makes  no  use  of  ihc  vanablc  v.  However,  it  docs  assume  that  v  has  a 
dcfiniuon  in  node  d.  As  in  chapter  5.  we  assume,  without  loss  of  generality,  that  u  is  a  node,  rath- 
er than  an  edge. 
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b)  if  k  properly  s-dominates  j  and  k?ti  then  k  (properly)  s-dominates  i. 

We  use  s-DOM(j)  to  denote  {i  I  i  s-dominates  j}. 

Consider  the  graph  G^  obtained  from  G  by  deleting  all  nodes  and  edges  which  are 
not  reachable  from  s,  deleting  all  edges  of  the  form  (i,s)  and  making  s  the  entry  node.  It 
follows  immediately  from  the  definitions  that  i  s-dominates  j  in  G  if  and  only  if  i  dom- 
inates j  in  G^.  With  this  observation,  the  following  properties  of  the  s-dominance  rela- 
tion follow  from  the  analogous  propenies  of  the  dominance  relation. 

LEMMA  AD.  1 :  s-DOM(s)  =  { s ) . 

LEMMA  AH. 2:  The  s-dominance  relation  of  a  flow  graph  is  a  partial  ordering. 

LEMMA  AD. 3:  Node  s  s-dominates  all  nodes  which  are  reachable  from  s. 

LEMMA  AII.4:  The  s-dominators  of  a  node  form  a  chain  (i.e.  a  linear  ordering). 

LEMMA  AD. 5:  Every  node  except  s  which  is  reachable  from  s  has  a  unique  immediate 
dominator. 

LEMMA  AD. 6:  The  graph  of  the  reflexive  and  transitive  reduction  of  the  s-dominance 
relation  of  a  flow  graph  is  a  forest.  There  is  one  tree  T  in  this  forest  which  contains  all 
and  only  those  nodes  which  are  reachable  from  s.  There  is  an  arc  (i  j)  in  T  if  and  only  if 
i  immediately  dominates  j,  and  there  is  a  path  in  T  fi-om  i  to  j  if  and  only  if  i  dominates  j. 

We  note  that  for  a  given  s,  all  of  the  sets  s-DOM(j)  can  be  computed  in  time  propor- 
tional to  the  number  of  nodes  in  G  as  follows. 

1)  Perform  a  depth  first  search  on  G  starting  at  s  to  determine  which  nodes  are  in  G^ 
and  to  derive  a  reverse  postordering  of  the  nodes  in  G^. 

2)  Apply  the  dominator  algorithm  for  reducible  flow  graphs  [?iEC77]  to  compute 
the  sets  DOM(j)  in  G^.  By  the  above  observation,  these  are  the  sets  s-DOM(j)  in  G. 
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Each  of  these  steps  takes  0(e)  bit  vector  operations,  where  e  is  the  number  of  edges  in  G. 
Since  G  is  a  structured  flow  graph,  e=0(IGI).  To  justify  step  2,  we  note  that  if  G'^  were 
irreducible  it  would  have  a  "forbidden  subgraph"  [HEC77].  TTien  G  would  also  have  a 
"forbidden  subgraph".  But  since  G  is  a  structured  flow  graph,  it  is  reducible,  hence  has 
no  "forbidden  subgraph." 

Recall  that  given  a  definition-use  association  (d,u,v),  algorithm  PEM2  (Algorithm 
5.5.2)  needs  to  build  up  the  path  expression  4,  representing  the  initial  set  of  candidates, 
from  three  components,  a''*^  represents  the  set  of  paths  from  the  entry  node  to  node  d; 
a"^"  represents  the  set  of  paths  from  node  d  to  node  u;  a"'"  represents  the  set  of  paths 
from  the  entry  node  to  node  d; 

We  will  present  an  algorithm  which,  given  a  "source"  node  s  and  a  "target"  node  t, 
will  produce  a  path  expression  which  represents  all  of  the  paths  (n\,...,ni,)  where  ni=s, 
/i;t=t.  and  ny{s,t)  for  l<j<k.*  The  intuition  motivating  the  algorithm  is  that  every  path 
from  s  to  t  must  pass  through  all  of  the  s-dominators  of  t.  In  addition,  such  paths  may 
pass  through  other  nodes. 

Recall  that  if  node  h  is  a  loop  header,  then  a  path  through  the  loop  headed  by  /i  is  a 
path  of  length  greater  than  or  equal  to  one.  which  begins  and  ends  at  node  h,  and  which 
does  not  contain  any  nodes  which  are  not  in  the  loop  headed  by  h.  If  n  is  not  a  loop 
header,  wc  v.  ill  say  that  a  path  n  is  n-free  if  and  only  if  node  n  does  not  occur  in  n,  except 
possibly  as  the  first  ncxle  in  the  path.  If  n  is  a  loop  header,  wc  will  say  that  n  is  n-free  if 
and  only  if  7i=po,  where  p  is  either  the  null  path  or  a  path  through  the  loop  headed  by  n, 
and  n  does  not  occur  in  o.  For  our  purposes,  given  a  souac  s  and  a  target  t.  it  will  suffice 


•  The  path  expression  produced  may  also  rcprcseni  some  additional  paths  from  s  lo  i. 
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to  produce  a  path  expression  a  such  that  all  of  the  s-free  paths  from  s  to  t  belong  to 
PATHS(a). 

We  will  present  two  algorithms,  BTl  and  BT2.  BTl  builds  a  path  expression  tree 
representing  the  s-free  paths  from  s  to  t  in  the  case  that  s  dominates  t.  We  first  present 
the  algorithm  for  BTl  and  prove  its  correctness.  We  then  show  why  this  algorithm  fails 
when  s  does  not  dominate  t,  and  present  a  somewhat  more  complicate  algorithm  BT2, 
which  handles  the  general  case.  Finally,  we  argue  that  BT2  is  correct,  analyze  its  com- 
plexity, and  show  that  the  length  of  the  path  expression  produced  is  0(N)- 

We  distinguish  between  three  types  of  nodes  which  may  occur  in  the  flow  graph  G: 
loop  header  nodes,  selection  exit  nodes,  and  regular  nodes.  A  loop  header  node,  n,  has 
exactiy  two  predecessors,  p\  and  p2,  where  p\  is  the  immediate  dominator  of  n,  and  n 
dominates  p  2-  A  selection  exit,  has  at  least  two  predecessors;  if  n  is  a  selection  exit  then 
none  of  n's  predecessors  dominate  n.  A  regular  node  has  at  most  one  predecessor;  if  p  is 
the  predecessor  of  a  regular  node  n,  then  p  is  the  immediate  dominator  of  n.  It  follows 
from  the  structure  of  the  subgraphs  given  in  Figure  2.1.1  that  the  sets  {loop  header 
nodes),  {selection  exit  nodes),  and  {regular  nodes)  form  a  panition  of  the  set  of  nodes. 
The  nodes  in  G  can  easily  be  classified  by  performing  a  depth  first  search  of  G.  For  each 
back  edge,  (ij),  j  is  a  loop  header,  for  each  cross  edge,  (i j),  j  is  a  selection  exit;  all  nodes 
which  are  neither  the  target  of  a  back  edge,  nor  the  target  of  a  cross  edge  are  regular 
nodes. 

The  algorithms  BTl  and  BT2  make  use  of  the  following  three  primitive  functions. 
CONCAT  takes  as  input  two  trees,  7"]  and  T2,  and  returns  the  tree  whose  root  is  labelled 
by  the  symbol  "•",  whose  left  subtree  is  7],  and  whose  right  subtree  is  T2-  UNION  takes 
as  input  two  trees,  7"i  and  T2,  and  returns  the  tree  whose  root  is  labelled  by  the  symbol 
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"\^",  whose  left  subtree  is  Tj ,  and  whose  right  subtree  is  T2.  MAKEVERTEX  takes  as 
input  a  node  symbol  n,  or  a  loop  symbol  p,,  and  returns  the  tree  whose  root  is  labelled  by 
that  symbol,  and  whose  left  and  right  subtrees  are  null. 

Function  BTl  is  given  in  Figure  AII.l. 

function  BTl(s,t:  NodeRange):  Tree; 

(returns  path  expression  tree  representing  set  of  paths  from  s  to  t,  where  s  dominates  t) 

begin 

if  s  does  not  dominate  t  then  Error, 
if  s=l  then  BTl  :=  MAKEVERTEX(X) 
else 
begin 

imm_s_dom  :=  immediate  s-dominator  oft; 
if  t  is  regular  node  then 

BTl  :=  CONCAT(BTl(s,  imm_s_dom),MAKEVERTEX(n,)) 
else  if  t  is  a  loop  header  then 

BTl  :=  CONCAT(BTl(s,  imm_s_dom),MAKEVERTEX(p,)) 
else  (t  is  a  selection  exit) 
begin 

let  / 1  ,...,/;„  be  the  predecessors  of  t; 
{ m  is  greater  than  or  equal  to  2 } 
Tl:=  UNION(BTUimm_s_dom,i  1), 

BTl  (imm_s_dom,j  2 )); 
for  j  :=  3  to  m  do 
begin 

T2  :=T1; 

Tl  :=  UNION(BTl(imm_s_dom,/p,T2); 
end; 
{*)  T2:=C0NCAT(Tl,MAKEVERTEX(n,)); 

BTl  :=  CONCAT(BTl(s.  imm_s_dom).T2); 
end 
end 
end. 

Figure  AII.l 

Roughly  speaking.  BTl  involves  climbing  upwards  through  the  tree  of  immediate-s- 
dominators  from  t  to  s  Whenever  a  selection-exit  n  is  encountered,  BTl  calls  itself 
recursively  to  build  subtfccs  representing  the  sets  of  paths  from  the  entrance  of  the  sclcc- 
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tion  subgraph  to  the  predecessors  of  n,  then  "unions"  these  subtrees  together. 

THEOREM 

Let  s  dominate  t.  Let  a  be  the  path  expression  represented  by  the  tree  returned  by 
BTl(s,t).  Then  if  node  s  is  not  a  loop  header,  the  path  expression  n^-a  represents 
the  set  of  s-free  paths  from  s  to  t,  and  if  node  s  is  a  loop  header,  the  path  expression 
p^-a  represents  the  set  of  s-free  paths  firom  s  to  t. 

PROOF: 

Let  C'  be  the  subgraph  of  C^  consisting  of  those  nodes  from  which  t  is  reachable. 
The  proof  is  by  course-of-values  induction  on  the  size  of  the  G^'. 

Base  Case:  IG*''I=1. 

If  \G^''\=\,  then  s=t,  so  BTl(s,t)  consists  of  a  single  vertex  labelled  by  X,  and  a=X. 
PATHS(n^-a)=PATHS(ni)={(«^)),  which  is  the  set  of  paths  from  s  to  t. 

Inductive  case: 

Suppose  the  theorem  holds  for  all  5'  and  r'  such  that  s'  dominates  r'  and  \G  '  \<  k. 
We  show  that  the  theorem  holds  for  s  and  t  such  that  C^''=  k.  We  will  assume  that  s  is 
not  a  loop  header.  The  proof  when  s  is  a  loop  header  is  identical  except  that  occurrences 
of  the  symbol  n^  are  replaced  by  the  symbol  p^. 

case  1 :  t  is  a  regular  node. 

Let  d  be  the  immediate  s-dominator  of  t.  Every  s-free  path  from  s  to  t  is  of  the  form 
Kn,  where  7t  is  an  s-free  path  from  s  to  d.  The  function  BTW(s,t)  returns  a  tree  represent- 
ing the  path  expression  a=P-n,,  where  P  is  the  path  expression  represented  by  BTl(s,d). 
So  ni-a=n^-p-n,.  By  the  inductive  hypothesis,  n^-P  represents  the  set  of  s-free  paths 
from  s  to  d.  Therefore, 
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PATHS(ni-a)={7i«,l  t:  is  an  s-free  path  from  s  to  d}={s-free  paths  from  s  to  t). 

case  2:  t  is  a  loop  header. 

Let  d  be  the  immediate  s-dominator  of  t.  Every  s-free  path  from  s  to  t  is  of  the  form 
Ttp  where  7t  is  an  s-free  path  from  s  to  d  and  p  is  a  path  through  the  loop  headed  by  node 
t.  The  function  BTW(s,t)  returns  a  tree  representing  the  path  expression  a=p-p,,  where 
[i=BTl(s,d).  So  n^-a=n^Pp,.  By  the  inductive  hypothesis,  n^-P  represents  the  set  of  s- 
free  paths  from  s  to  d.  By  definition,  p,  represents  the  set  of  through  the  loop  headed  by 
t.  Therefore,  PATHS(n^-a)={ s-free  paths  from  s  to  t}. 

case  3:  t  is  a  selection  exit. 

Let  d  be  the  immediate  dominator  of  t.  Note  that  d  is  the  entrance  to  the  selection 
subgraph  corresponding  to  t.  Since  s  dominates  t,  s  also  dominates  d,  so  d  is  the  immedi- 
ate s-dominator  of  t.  Let  / 1  ,.../^  be  the  predecessors  of  t.  Note  that  d  dominates  each  of 
the  ij.  Every  s-free  path  from  s  to  t  is  of  the  form  pdat,  where  pd  is  a  s-free  path  from  s 
to  d,  and  do  is  a  d-free  path  from  d  to  one  of  the  ij. 

By  the  inductive  h>'pothesis,  for  each  j,  l<j<m,  the  path  expression  a^  represented 
by  the  tree  BTKd,/^),  has  the  property  that  nj-a^  represents  the  set  of  d-free  paths  from  d 
to  ij.  So  after  executing  the  statement  marked  by  [*j,  T2  represents  a  path  expression  (3, 
where  n^-^  represents  the  set  of  d-free  paths  from  d  to  t.  By  the  inductive  hypothesis, 
BTKs.d)  represents  a  path  expression  y  where  n,-Y  represents  the  set  of  s-frce  paths  from 
s  to  d.  So  BTl(s,t)  represents  the  path  expression  yP  where  n,-y-p  represents  the  set  of 
s-free  paths  from  s  to  t.B 

When  s  does  not  dominate  i,  the  situation  becomes  more  complicated.  Intuiuvely, 
this  is  because  the  graph  G*  is  not  necessarily  a  structured  flow  graph.   In  this  situation 
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the  immediate  s-dominators  of  a  ncxie  is  not  necessarily  the  same  as  its  immediate  domi- 
nator. 

Consider  the  flow  graph  shown  in  Figure  All. 2. a.  Let  s  be  node  6.  The  tree  of 
immediate  6-dominators  is  shown  in  Figure  An.2.b.  Those  nodes  which  are  loop  headers 
(in  G)  have  been  marked  with  an  asterisk,  and  those  which  are  selection  exits  (in  G)  are 
marked  with  a  plus  sign. 

Let  t  be  node  10.  Climbing  upward  through  the  tree  of  immediate-6-dominators 
from  10  to  6  yields  the  path  expression  a=6-8p4-910.  This  path  expression  represents 
some,  but  not  all,  of  the  paths  firom  node  6  to  node  10.  It  does  not  represent  those  paths 
which,  after  visiting  node  9,  "bypass"  node  10,  going  instead  to  nodes  11  and  12,  then 
around  the  loop  headed  by  node  2  zero  or  more  times,  then  back  to  nodes  3,  4,  and  9.  We 
can  view  the  problem  as  arising  from  the  fact  that  node  4  can  be  preceded  either  by  node 
8,  which  immediately-s-dominates  node  4,  or  by  node  3,  which  does  not  s-dominate  node 
4.  Note  that  the  immediate  6-dominator  of  node  4  and  the  immediate  dominator  of  node 
4  differ.  If  we  replace  the  symbol  P4  in  a  by  the  sub-path-expression 

(p4-9(10ull)12p2-3,^X)p4, 
which  represents  the  set  of  paths  from  node  4  to  node  4,  we  obtain  the  expression 

6-8-  (p4-9(10,^ll)12p2-3,^>.)p4   ^-lO, 
which  represents  the  set  of  paths  from  node  6  to  node  10,  as  desired. 

A  similar  problem  arises  in  the  flow  graph  shown  in  Figure  AII.3.a.  Let  s  be  node  4 
and  let  t  be  node  8.  The  tree  of  immediate  4-dominators  is  shown  in  Figure  AII.3.b. 
Climbing  up  the  tree  from  node  8  to  node  4  gives  the  path  expression  4-6-7-8,  which 
represents  some,  but  not  all  of  the  paths  from  4  to  8.  If  we  replace  the  symbol  6  by  the 
sub-paih-expression 
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6(7(8u9)10  p2-3(4,^5)-6»^X), 

which  represents  the  set  of  paths  starting  and  ending  at  ncxie  6,  we  obtain  the  path  expres- 
sion 4■6■(7•(8^^9)•10•p2•3•(4^^5)•6^^X)•7•8,  which  represents  the  set  of  paths  from 
node  4  to  node  8,  as  desired. 

We  capture  these  idea  algorithmically,  as  follows.  As  before,  BT2  involves  climb- 
ing upwards  through  the  tree  of  immediate  s-dominators  from  t  to  s.  If  the  node  being 
processed  is  a  regular  node,  a  loop  header  whose  immediate  s-dominator  is  the  same  as 
its  immediate  dominator,  or  a  selection  exit  whose  immediate  s-dominator  is  the  same  as 
its  immediate  dominator,  then  BT2  will  behave  like  BTl.  However  if  the  node  n  being 
processed  is  a  loop  header  or  selection  exit  whose  immediate  s-dominator  and  immediate 
dominator  differ,  it  may  be  necessary  to  take  special  action,  in  order  to  include  a  sub- 
path-expression  representing  the  set  of  n-free  paths  from  n  to  n.  As  we  shall  see  below, 
this  special  action  is  not  always  necessary,  because  the  sub-path-expression  produced 
may  be  redundant.  We  will  present  the  algorithm  for  function  BT2,  after  some  prelim- 
inary definitions. 

Let  s  be  a  node  and  let  n  be  a  loop  header  node  in  G.  Let  e  be  the  exit  node  of  the 
loop  headed  by  n  and  let  d  be  the  immediate  dominator  of  n.  If  d  is  reachable  from  s  and 
e  s-dominates  n,  we  will  call  n  an  s-inverted  loop  header.  Note  that  if  n  is  s-inverted  then 
s  is  in  the  loop  headed  by  n  and  s^'n.  If  n  is  s-invened  then  the  immediate  dominator  of  n 
and  the  immediate  s-dominator  of  n  are  different  nodes.  Thus,  the  loop  in  G^  headed  by 
n  is  different  than  the  loop  in  G  headed  by  n. 

LEMMA  All  7: 

Let  n  be  an  s-inverted  loop  header  and  let  d  be  its  immediate  dominator.  Then  there  is 
exactly  one  loop  header  h  such  that 
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1)  n  is  in  the  body  of  the  loop  headed  by  h,  and 

2)  n  properly  s-dominates  h  and  h  s-dominates  d. 

Note  that  h  may  actually  be  d  itself.  We  will  call  h  the  header  corresponding  node  n. 

PROOF: 

Since  n  is  s-inverted,  s  is  in  the  body  of  the  loop  headed  by  n.  Otherwise,  every 
path  from  s  to  e  would  pass  through  n.  Since  d  is  reachable  from  s  there  must  be  some 
loop  which  contains  both  d  and  s  (see  Figure  All. 4.)  Let  h  be  the  header  of  the  innermost 
such  loop.  It  is  immediately  evident  from  the  figure  that  h  satisfies  1)  and  2). 

Let  h'  ^hbt  another  header  which  satisfies  1).  Then  the  loop  headed  by  h  is  nested 
within  the  loop  headed  by  h'.  There  is  a  path  from  s  to  d  which  does  not  pass  through  h', 
so  h'  does  not  satisfy  2)M 

We  will  now  present  the  second  algorithm,  BT2,  and  argue  that  it  is  correct. 
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procedure  BT2(s 

l,t:  NodeRange;  var  HeaderHit:  boolean;  var  T:  Tree); 

var  imm. 

_dom,  imm_s_dom  :NodeRange; 

begin 

{1}     ifs 

1  dominates  t  then  T  :=  BTl(sl,t) 

{2}     else  begin 

{3} 

imm. 

.dom  :=  immediate  dominator  of  t; 

{4} 

imm_ 

.s_dom  :=  immediate  s-dominator  of  t; 

{5) 

if  t  is  a  regular  node  then 

begir 

1 

{6} 

BT2(sl,imm  s  dom,HeaderHit,Tl); 

{7} 

end 

T  :=  CONCAT(Tl,MAKEVERTEX(n,)) 

{8} 

else  i 
begir 

f  t  is  a  loop  header  then 

1 

{9} 

mark  that  t  has  been  visited; 

{10} 

if  (t  is  inverted  loop  header)  then 
begin 

{11} 

HeaderHit  :=  true; 

{12} 

if  the  header  associated  with  t  has  not  yet  been  visited  then 
begin 

{13} 

BT2(t,imm_dom,HeaderHit,Tl); 

{14} 

T2  :=  UNION 

(CONCAT(MAKE  VER 1  hX(p,  ),T1 ) 
MAKEVERTEX(X)); 

{15} 

T3  :=  C0NCAT(T2,MAKEVERTEX(p,)); 

{16} 

BT2(sl,imm  s  dom,HeaderHit,T4); 

{17} 

T  :=  CONCAT(T4,T3); 
end 

{18} 

else 
begin 

{19} 

BT2(sl,imm  s  dom,HeaderHit,Tl); 

{20} 

T  :=  CONCAT(Tl,MAKEVERTEX(p,)); 
end 
end 

{21} 

else 
begin 

{22} 

BT2(sl,imm  s  dom,HeaderHit,Tl); 

{23} 

end 

T  :=  CONCAT(Tl,MAKEVERTEX(p,)); 
end 

{24} 

else  {  t  is  a  selection  exit  } 

{25} 

if  imm_dom  s-dominates  t  then 

{26} 

BT2(sl,imm  s  dom,HeaderHit,Tl); 

{27} 

T  :=  CONCAT(Tl,BTl(imm_dom,t)) 

{28} 

else  if  t  s-dominates  imm_dom  and  not  HeaderHit  then 
begin 

(29) 
{30} 

Tl  :=  BTl(imm_dom,t); 
BT2(t,imm_dom,HeaderHit,T2); 
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{31} 

T3  :=C0NCAT(T2,T1); 

{32} 

T4  :=  UNI0N(MAKEVERTEX(X),T3); 

{33} 

T5  :=  CONCAT(MAKEVER  lhX(n,),T4); 

{34} 

BT2(sl,imm  s  dom,HeaderHit,T6); 

{35} 

end 

T  :=  CONCAT(T6,T5) 

{36} 

else 

begin 

{37} 

BT2(sl,imm  s  dom,HeaderHit,Tl); 

{38} 

end 

T  :=  CONCAT(Tl,MAKEVERTEX(n,)) 

end 

end. 

function  BuildTree(G,s,t :  NodeRange)  :  Tree; 

begin 

{initialize  global  variables} 

build  tree  of  immediate  dominators; 

build  tree  of  immediate  s-dominators; 

classify  each  node  as  being  a  regular  node,  a  loop  header,  or  a  selection  exit; 

mark  each  s-inverted  loop  header  as  such  and  find  the  associated  loop  header; 

mark  each  loop  header  as  unvisited; 

HeaderHit  :=  false; 

(call  procedure  BT2  to  build  tree  T=7", 

where  n^-a  represents  the  set  of  candidates  corresponding 

to  the  definition-use  pair  (s,t)} 
BT2(s,t,HeaderHit,T); 

BuildTree  :=  CONCAT(MAKEVERTEX(n,),T); 
end; 

We  now  argue,  somewhat  informally,  that  BuildTree(G,s,t)  returns  a  tree  represent- 
ing a  set  of  paths  which  contains  all  of  the  s-free  paths  from  s  to  t,  as  desired. 

When  BT2  is  called  from  BuildTree,  the  effect  is  to  "climb"  upward  (toward  the 
root)  through  the  tree  of  immediate  s-dominators  from  t  to  s,  backing  down  the  tree  under 
certain  special  circumstances. 

We  first  discuss  the  cases  where  the  mimediate  dominator  and  immediate  s- 
dominaior  of  the  current  node  are  the  same.   If  the  current  node  t  is  a  regular  node,  then 


-  189- 

the  algorithm  "emits"  a  vertex  labelled  by  n,  (line  7)  and  continues  to  climb  up  through 
the  tree  (line  6).  Similarly,  if  the  t  is  a  non-inverted  loop  header  node,  then  the  algorithm 
"emits"  a  vertex  labelled  by  p,  (line  23)  and  continues  to  climb  up  through  the  tree  Gine 
22).  If  the  t  is  a  selection  exit  whose  immediate  dominator,  d,  and  immediate-s- 
dominator  are  the  same  node,  then  the  algorithm  "emits"  a  subtree  representing  the  d-free 
paths  from  d  to  t  Qine  27)  then  continues  to  climb  up  through  the  tree  (line  26).  In  each 
of  these  cases,  an  argument  similar  to  the  ones  made  in  the  proof  of  correctness  of  BTl 
could  be  made  to  show  that  the  resulting  path  expression  includes  all  of  the  s-free  paths 
from  si  to  t. 

We  next  consider  what  happens  when  the  current  node,  t,  is  an  inverted  loop  header. 
Let  d  be  the  immediate  dominator  of  t,  let  e  be  the  other  predecessor  of  t  (the  exit  node  of 
the  body  of  the  loop  headed  by  t,  hence  the  immediate-s-dominator  of  t)  let  h  be  the 
header  associated  with  t,  and  let  n  be  the  successor  of  t  which  is  not  in  the  loop  (see  Fig- 
ure All. 5.) 

If  node  h,  the  header  corresponding  to  t,  has  not  yet  been  visited,  then  the  algorithm 
returns  a  tree  corresponding  to  the  path  expression  T=P•(p,•a^^X,)•p,  where  P  comes 
from  the  recursive  call  to  BT2  in  line  16  and  a  comes  from  the  recursive  call  to  BT2  in 
line  13. 

The  s-free  paths  from  s  to  t  are  all  of  the  form  Tip,  where  7t  is  an  s-free  path  from  s 
to  e  and  p  is  either  of  the  form  eox^  or  to,  where  a  and  ^  are  s-free  paths  through  the 
loop  headed  by  t  and  i  is  an  s-free  path  from  n  to  d.  Any  s-free  path  from  s  to  t  can  be 
decomposed  in  such  a  way  that  kg  PATHS(p)  and  pe  PATHS((p,-a^^X)p,). 

If  node  h,  the  header  corresponding  to  t,  has  already  been  visited,  then  a  subtree 
labelled  by  a  loop  symbol  representing  the  outermost  loop  header  which  is  reachable 
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from  s  and  which  reaches  the  original  target  has  already  been  emitted.  All  of  the  paths 
from  t  to  d  are  subpaths  of  paths  represented  by  this  loop  header.  So  it  would  be  redun- 
dant to  emit  the  path  expression  (P;a\^X)p,).  Thus,  in  lines  22  and  23  the  algorithm 
emits  a  path  expression  representing  the  s-free  paths  which  go  from  s  to  e,  then  through 
the  loop  headed  by  t. 

If  the  current  node  t  is  a  selection  exit,  there  are  four  cases  to  consider.  Let  id 
represent  the  immediate  dominator  of  t  and  let  isd  represent  the  immediate  s-dominator 
of  t.  If  id=isd,  then  s  dominates  id,  and  the  situation  is  similar  the  situation  treated  in 
algorithm  BTl.  BT2  emits  a  path  expression  a- (3,  where  a  (the  path  expression 
corresponding  to  the  tree  returned  by  the  recursive  call  in  line  22)  represents  s-free  paths 
from  s  to  id  and  (3  (the  path  expression  corresponding  to  the  tree  returned  by  the  call  to 
BTl  in  line  23)  represents  paths  through  the  selection  subgraph. 

If  id=isd  then  node  s  is  in  the  selection  subgraph  whose  exit  node  is  t.  This  sub- 
graph has  a  different  "shape"  in  G'^  than  in  G.  If  t  s-dominates  id  {line  28},  then  there  is 
a  path  from  s  to  id.  Each  s-free  path  from  s  to  t  is  of  the  form  7tp,  where  K  is  n  s-free  path 
from  s  to  t  and  p  is  a  path  from  t  to  t.  Any  path  from  t  to  t  is  either  the  null  path  or  a  path 
oC  where  a  goes  from  t  to  id  and  I,  goes  from  id  to  t.  The  path  expression  corresponding 
to  the  tree  built  in  lines  29  to  35  represents  these  paths.  Intuinvely,  in  lines  29  and  30  the 
algorithm  "backs  down"  the  tree  of  immediate  s-dominators. 

If  the  variable  HeaderHii  has  already  been  set  to  true  when  control  reaches  line  28, 
then  the  outermost  loop  header  containing  s  and  the  original  target  node  has  already  been 
emitted,  so  the  path  expression  representing  paths  from  t  to  t  is  redundant,  this  case  the 
algorithm  emits  a  path  expression  (in  lines  37  and  38)  representing  the  s-free  paths  which 
go  "directly"  from  s  to  t.  without  "backing  down"  the  tree  of  immediate  s-dominators. 
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Similarly,  if  id  does  not  s-dominate  t  and  t  does  not  s-dominate  id,  then  id  is  not 
reachable  from  s,  so  the  algorithm  emits  a  path  expression  (in  lines  37  and  38)  represent- 
ing the  s-free  paths  which  go  "directly"  from  s  to  t,  without  "backing  down"  the  tree  of 
immediate  s-dominators. 

In  the  execution  of  BuildTree(G,s,t)  no  vertex  in  the  tree  of  immediate  s-dominators 
is  visited  more  that  twice.  This  is  because  the  only  way  a  vertex  which  has  already  been 
visited  can  be  visited  again  is  if  the  algorithm  "backs  down"  the  tree  of  immediate-s- 
dominators  in  line  13  or  line  30.  When  the  recursive  call  in  line  30  is  performed,  an 
inverted  loop  header  node  which  is  s-dominated  by  t  and  which  s-dominates  id  is 
guaranteed  to  be  visited,  at  which  point  HeaderHit  will  be  set  to  true.  So  line  30  is  exe- 
cuted at  most  once  in  the  execution  of  BuildTree.  When  line  13  is  executed,  the  headers 
corresponding  to  inverted  loop  header  t  and  to  all  of  the  other  inverted  loop  headers  in 
whose  loop  t  lies  are  visited.  Thus  in  the  execution  of  BuildTree,  each  edge  (id,t)  is 
backed  down  at  most  once.  It  follows  that  the  length  of  the  path  expression  produced  by 
BuildTree  is  0(N)  and  that  the  running  time  of  BuildTree  is  0(N)  where  N  is  the  number 
of  nodes  in  G. 
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APPENDIX  III: 

In  this  appendix  we  present  a  script  of  a  session  with  an  implemented  prototype  of 
the  path  expression  heuristic.  This  implementation  is  interactive.  It  allows  the  user  to 
select  the  loop  symbol  on  which  to  concentrate.  It  also  requires  that  the  user  perform  the 
partial  symbolic  evaluation  and  theorem  proving  by  hand.  The  implemented  algorithm 
differs  in  certain  minor  ways  from  the  algorithm  described  in  chapter  5. 

In  this  session  we  examine  the  association  (2,(6, 14), switch)  from  the  bubblesort 
program  in  Figure  5.2.2.  This  is  the  same  association  which  was  examined  in  Examples 
5.2.2  and  5.5.1. 

Text  entered  by  the  user  is  printed  in  bold  face.  Comments  are  in  italics.  The  sym- 
bol F  stands  for  0  and  the  symbol  L  stands  for  X. 

Script  started  on  Tue  Sep  22  21:10:05  1987 

csd27>  heuristic 

Would  you  like  to  check  another  def-use  association?(Y/N) 

==>y 

Enter  the  source  and  target  node 

2  14 

Enter  the  variable 

switch 

The  initial  path  expression  is 

N2.P3.N5.P6.N14 

Select  one  of  the  following  actions: 
A:  Expand  a  loop  symbol 
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B:  Toggle  parentheses 

C:  Print  current  path  expression 

D:  Print  Always/Never  info  for  subexpression 

E:  Give  up 

===>a 

Enter  position  of  a  loop  symbol 

3 

User  directs  system  to  first  concentrate  on  the  symbol  P3, 

which  is  the  third  symbol  in  the  current  path  expression. 

Expanding  loop  symbol  P3. 

Use  partial  symbolic  evaluation  to  derive  the  panial 
path  condition  E1&  ...  &Ek  for  the  path  expression 

botP3.N5.P6.N14top 
which  follows  the  loop  symbol. 

Derive  a  predicate  which  is  always  true  when  control 
reaches  the  top  of  this  occurrence  of  P3 
by  panially  symbolically  evaluating  the  path  expression 
botNl.N2.P3top 

For  each  i. 

If  (Ei  &  C)  is  unsatisfiable.  let  MRDi  be  the 
set  of  variables  occurring  in  Ei. 
Otherwise  MRDi  is  empty. 
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Enter  the  non-empty  sets  MRDi 

Enter  variables  in  the  next  MRDi  set,  one  per  line. 
Type  "."  to  end  the  list  of  variables  in  this  set. 
Enter  "*"  to  end  the  list  of  sets. 


Enter  variables  in  the  next  MRDi  set,  one  per  line. 
Type  "."  to  end  the  list  of  variables  in  this  set. 

Enter  "*"  to  end  the  list  of  sets. 

* 

User  enters  the  set  (i)  because  the  predicate  i>5 

is  necessary  on  exit  from  the  loop  and  is  unsatisfiable  before 

entry  into  the  loop.  Thus  i  must  change  value  on  any  executable 

path  through  the  loop. 

Expanding  loop  symbol 

The  simplified  subexpression  representing  the  loop  is 
((N3.N4)-i-  U  L).N3 

The  new  path  expression  is 
N2.((N3.N4)+  U  L).N3.N5.P6.N14 

Select  one  of  the  following  actions: 
A:  Expand  a  loop  symbol 
B:  Toggle  parentheses 
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C:  Print  current  path  expression 

D:  Print  Always/Never  info  for  subexpression 

E:  Give  up 

===>a 

Enter  position  of  a  loop  symbol 

14 

The  loop  symbol  P6  is  the  14th  symbol  in  the  path  expression 

(not  counting  parentheses.) 

Expanding  loop  symbol  P6. 

Use  panial  symbolic  evaluation  to  derive  the  partial 
path  condition  E1&  ...  &Ek  for  the  path  expression 

botP6.N14top 
which  follows  the  loop  symbol. 

Derive  a  predicate  which  is  always  true  when  control 
reaches  the  top  of  this  occurrence  of  P6 
by  partially  symbolically  evaluating  the  path  expression 
botNl.N2.{(N3.N4)+  U  L).N3.N5.P6iop 

For  each  i, 

If  (Ei  &  C)  is  unsatisfiablc.  let  MRDi  be  the 

set  of  variables  occurring  in  Ei. 

Otherwise  MRDi  is  empty. 

Enter  the  non-empty  sets  MRDi 
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Enter  variables  in  the  next  MRDi  set,  one  per  line. 

Type  "."  to  end  the  list  of  variables  in  this  set. 

Enter  "*"  to  end  the  list  of  sets. 

n 

switch 

Enter  variables  in  the  next  MRDi  set,  one  per  line. 
Type  "."  to  end  the  list  of  variables  in  this  set. 

Enter  "*"  to  end  the  list  of  sets. 

* 

n  or  switch  must  be  redefined  on  any  executable  path 

through  the  loop 

Expanding  loop  symbol 

Pruning  the  expression.  Variable 

switch  is  always  defined  in  the  sub-expression 

(N6.N7.P8.N13H 

The  simplified  subexpression  representing  the  loop  is 

N6 

Each  path  through  the  body  of  the  loop  has  a  definition  of  switch 

None  of  the  variables  in  the  set 

{  switch  n  ) 

are  defined  in  the  subexpression 

N6 
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The  new  path  expression  is 
F 

No  more  candidates  for  unexecutable  def-clear  path. 

The  association  is  unexecutable 

Would  you  like  to  check  another  def-use  association?(Y/N) 

==>  n 

csd27>  *D 

script  done  on  Tue  Sep  22  21:15:41  1987 
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